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ABSTRACT 

In  recent  years  there  has  been  a  rapid  growth  of  interest  in  the 
development  of  stochastic  models  of  consumer  brand  choice  behavior.   The 
present  paper  is  concerned  with  the  development  and  empirical  testing 
of  a  viable  stochastic  model  of  brand  choice.   In  its  present  form  the 
model  considers  a  two  brand  market. 

The  present  formulation  is  a  probability  diffusion  model.   That  is, 
an  individual  consumer's  probability  of  purchasing  brand  A  changes 
through  time  according  to  a  diffusion  process.   Hence,  a  consumer's 
brand  choice  probability  is  non-stationary.  The  brand  choice  probability 
of  each  consumer  changes  through  time  according  to  the  same  stochastic 
diffusion  process.   Yet  at  any  point  in  time,  each  consumer  has  his  own 
unique  brand  choice  probability.   That  is,  at  any  time  consumers  are 
heterogeneous  with  respect  to  their  probability  of  purchasing  brand  A. 
In  sum,  the  diffusion  model  implies  that  with  respect  to  brand  choice 
probability  a  market  is  composed  of  heterogeneous,  non-stationary 
individuals. 

The  diffusion  model  is  derived  as  a  limiting  case  of  what  James  S. 
Coleman  has  termed  latent  Markov  models.   From  a  system  of  axioms  similar 
in  nature  to  those  commonly  utilized  in  stimulus  sampling  theory,  a 
general  latent  Markov  model  is  formulated.  Two  alternative  specifications 
of  this  general  model  are  examined  in  detail.   The  independent  elements 
specification  is  shown  to  be  an  unsatisfactory  model  of  consumer  brand 
choice.   However,  the  cohesive  elements  specification  is  shown  to  yield 
a  viable  stochastic  model  of  choice  behavior. 
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This  Cohesive  Elements  Model  is  shown  to  be  a  probability  diffusion 
model  in  its  limiting  or  infinite  element  form.   In  addition,  Coleman's 
contagious  binomial  distribution  is  shown  to  go  to  a  beta  form  in  the 
infinite  element  case. 

Two  alternative  procedures  for  estimating  and  testing  the  model  are 
presented.   The  recursive  regression  procedures,  while  having  the  advantage 
of  being  computationally  convenient,  may  yield  somewhat  ambiguous  measures 
of  the  "goodness  of  fit"  of  the  model  to  any  set  of  empirical  data.   In 
addition,  the  small  sample  properties  of  the  parameter  estimates  are 
problematical.   The  minimum  chi  square  estimation  method  yields  a  single, 
unambiguous  measure  of  the  "goodness  of  fit"  of  the  model  in  addition 
to  parameter  estimates  which  have  desirable  statistical  properties.   The 
highly  non-linear  nature  of  the  chi  square  statistic  makes  analytic 
solution  for  the  minimum  value  difficult.   A  numeric  grid  search  procedure 
is  proposed  and  used  in  the  empirical  case  which  is  explored. 

The  model  is  tested  on  MRCA  National  Consumer  Panel  data  for  denti- 
frice in  the  periods  just  before  and  just  after  the  American  Dental 
Association's  endorsement  of  Crest.  The  model  is  shown  to  yield  a 
remarkably  excellent  "fit"  in  the  stable  period  before  the  endorsement 
as  well  as  in  the  transient  after  period.   In  addition,  the  model  implies 
that  the  A.D.A.  endorsement  enhanced  Crest's  retentive  power  as  a  brand 
even  more  than  it  enhanced  its  attractive  power. 
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Chapter  I 
INTRODUCTION 

1.1  Models  in  Marketing 

The  past  fifteen  years  have  seen  a  rapid  growth  of  interest  in  the 
development  of  models  for  marketing  phenomena.    Model  building  and 
testing  has  come  to  be  seen  as  a  necessary  element  in  the  emergence 
of  a  science  of  marketing. 

Models  have  been  developed  for  analyzing  consumer  brand  switching 
behavior,  allocation  of  selling  expense,  allocation  of  advertising  ex- 
penditures, and  forecasting  sales  to  mention  a  few  of  the  area  in  which 

2 
work  has  been  done.   As  the  pace  of  technological  change  accelerates 

and  as  new  product  failure  rates  continue  at  their  shockingly  high 

3 
levels,   more  and  better  models  will  be  sought  to  deal  with  the  in- 
creasingly complex  and  costly  problems  which  arise  in  marketing. 

Models  are  generally  developed  with  one  or  more  of  the  following 
purposes  in  mind;   understanding,  prediction,  control.   Models  may  con- 
tribute to  our  understanding  of  a  process  by  relating  a  well  specified 
representation  of  a  process  to  the  observable  events  which  constitute 
the  data  in  any  empirical  situation.   For  example,  we  might  suppose 
that  consumers  make  their  brand  choices  in  a  product  class  according 
to  a  Bernoulli  process.  While  we  know  that  this  model  is  unlikely  to 
be  precisely  true,  we  would  like  to  know  whether  or  not  it  is  a  reason- 
able representation.   The  reasonableness  of  the  model  representation 
of  the  underlying  process  may  be  tested  by  whether  or  not  it  is  con- 
sistent with  the  overt  behavior  which  we  have  observed  as  our  data. 
If  the  model  turns  out  to  be  reasonable,  we,  hopefully,  will  then 
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have  a  better  understanding  of  the  process. 

Prediction  is  also  an  important  function  of  models.   In  a  world  where 
the  processes  generating  social  behavior  are  subject  to  frequent  change, 
model  predictions  are  often  the  only  means  by  which  we  may  infer  certain 
implications  of  the  current  state  of  the  process.   For  example,  Markov 
chains  have  been  used  to  predict  equilibrium  market  shares  under  the 
condition  of  ceteris  paribus.   Since  it  is  the  rare  market  that  exper- 
iences no  changes  in  its  underlying  process  due  to  the  ebb  and  flow  of 
competitive  activity,  these  equilibrium  market  share  inferences  from  the 
Markov  chain  model  contribute  information  which  is  not  available  from 
the  data  in  its  raw  form  since  the  process  will  probably  change  before 
it  has  reached  equilibrium. 

Control  of  a  process,  of  course,  may  be  seen  as  the  ultimate  ob- 
jective of  model  building.   In  general,  control  must  await  developments 
in  prediction  and  understanding  of  the  process  with  which  we  are  con- 
cerned. 

The  present  report  is  focused  upon  stochastic  models  of  consumer 
brand  choice.   The  primary  objectives  at  this  point  are  understanding 
of  the  process  which  generates  consumer  brand  choice  arid  prediction  of 
dynamic  market  behavior. 

An  understanding  of  consumer  choice  behavior  is  of  clear  importance 
to  the  marketing  manager.   It  seems  reasonable  to  assume  that  marketing 
managers  would  prefer  to  base  policy  decisions  upon  how  consumers  actually 
behave  rather  than  upon  normative  economic  prescriptions  which  tell 
how  the  rational  consumer  ought  to  behave. 
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It  is  anticipated  that  work  on  stochastic  models  of  consumer  behavior 
will  prove  useful  to  market  researchers  and  marketing  managers  in  the  fol- 
lowing endeavors ■ 

1.  Development  of  more  effective  bases  of  market  segmentation) 

2.  Development  of  micro-simulations  of  consumer  behavior j 

3.  Measurement  of  the  dynamic  impact  of  marketing  decisions} 

4.  Measurement  of  dynamic  market  response  for  use  in  market 
experimentation; 

5.  Development  of  dynamic  models  for  use  in  monitoring  markets  for 
adaptive  marketing  systems. 

1.2  Models  of  Consumer  Brand  Choice;   Review  and  Comment 

1.2.1  General  Considerations 

As  Massy  (1966,  pg.  1)  has  pointed  out,  stochastic  models  of  consu- 
mer brand  choice  have  the  advantage  of  allowing  for  the  myriad  of  stimuli 
that  impinge  on  brand  choice  decisions  by  means  of  the  simple  but  ap- 
pealing mechanism  of  response  uncertainty.  By  response  uncertainty  we 
shall  mean  that  responses  (or  brand  purchases)  are  probabilistically 
determined. 

Several  factors  complicate  the  life  of  the  researcher  seeking  to 
build  and  test  stochastic  models  of  consumer  behavior.   They  may  be  listed 
as  follows; 

1.  There  may  be  a  many  to  one  mapping  of  models  into  a  set  of  data; 

2.  Individual  consumers  tend  to  be  heterogeneous  in  their  brand 
choice  behaviorj 


3.  The  stochastic  process  generating  a  sequence  of  brand  decisions 
may  itself  undergo  change; 

4.  The  combining  of  classes  problem  which  arises  when  an  N  brand 
market  is  collapsed  into  a  two  brand  market. 

For  a  readable  discussion  of  these  problems  along  with  simple  numerical 
examples,  the  reader  is  referred  to  Morrison  (1965b,  pp.  4-9). 

In  the  next  few  subsections  we  shall  review  a  portion  of  the  work 
which  has  been  done  on  models  of  consumer  behavior.   This  is  not  in  any 
sense  an  exhaustive  review  but  rather  is  intended  to  be  illustrative  of 
work  which  has  been  done. 

1.2.2  Early  Work  on  Consumer  Brand  Choice 

Brown.   Using  the  purchasing  records  of  100  families  from  the  Chicago 
Tribune  consumer  panel  for  the  year  1951,  Brown  (1952)  studied  brand 
loyalty  behavior  toward  certain  frequently  purchased  products  such  as 
toothpaste,  margarine,  coffee,  soap,  etc.   His  measure  of  brand  loyalty 
depended  upon  the  number  and  pattern  of  purchases  of  different  brands 
during  the  year.  Based  upon  his  operational  measure  of  brand  loyalty, 
Brown  classified  households  as  having:   undivided  loyalty,  divided  loyalty, 
unstable  loyalty,  or  no  loyalty.   Morrison  (1965b)  has  observed  that 
Brown's  measure  of  brand  loyalty  doesn't  even  necessarily  satisfy  the 
weakest  necessary  condition  for  a  meaningful  scale,  the  property  of 
transitivity.   While  hindsight  and  developments  during  the  ensuing  years 
may  tend  to  make  one  overcritical,  Brown's  work  did  reveal  that  consumers 
concentrate  their  purchases  much  more  than  had  been  previously  expected. 
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Cunningham.   The  definition  of  brand  loyalty  was  sharpened  In  a  later 
study  by  Cunningham  (1956).   He  operationally  defined  brand  loyalty  as 
the  proportion  of  total  purchases  within  a  product  class  that  a  house- 
hold devotes  to  its  favorite  or  most  frequently  purchased  brand.   In  a 
subsequent  study  Cunningham  (1961)  defined  store  loyalty  in  an  analogous 
manner.   The  research  hypotheses  in  these  studies  centered  about  the 
postulated  existence  of  brand  or  store  loyalties.   The  null  hypothesis 
in  these  studies  was  that  brands  would  be  purchased  and  stores  visited 
on  an  equiprobable  basis.   That  is,  the  null  hypothesis  was  that  no 
propensity  to  purchase  particular  brands  or  to  shop  in  particular  stores 
exists.   The  Chicago  Tribune  panel  once  again  served  as  the  data  base 
for  these  studies. 

The  results  of  these  studies  are  summarized  below; 

1.  Significant  brand  loyalty  exists  within  product  classes  (intra- 
class  loyalty), 

2.  Loyalty  proneness  or  the  propensity  to  be  brand  loyal  across 
product  classes  does  not  exist. 

3.  Store  and  brand  loyalty  are  not  significantly  related. 

4.  Purchases  on  deals  tend  to  be  concentrated  among  households 
having  low  brand  loyalties. 

5.  Consumption  and  brand  loyalty  are  unrelated. 

6.  A  household's  time  in  the  panel  does  not  relate  to  its  loyalty 
behavior. 

7.  There  is  more  store  loyalty  generated  toward  chain  stores  than 
toward  specialty  stores  or  independents. 
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Cunningham's  results  suggest  that  families  concentrate  their  brand 
and  store  choices  to  a  far  greater  extent  than  that  expected  under  the 
equiprobable  chance  model. 

In  terms  of  the  model  classifications  used  below,  Cunningham  im- 
plicitly assumed  that  the  panel  members  made  their  brand  choice  de- 
cisions according  to  a  stationary  Bernoulli  process. 

1.2.3  Bernoulli  Models 

Frank.   Frank  (1962)  analyzed  the  August,  1957  through  September, 
1958  regular  and  instant  coffee  purchases  from  the  Chicago  Tribune  panel 
in  an  effort  to  test  whether  consumer  brand  choice  can  be  described  by 
a  zero  order  process  at  the  household  level.   He  argued  that  some  of 
the  "learning"  effects  which  had  been  found  by  Kuehn  (1958)  may  in  fact 
be  spurious  as  a  result  of  the  aggregation  of  heterogeneous  consumers 
whose  true  brand  switching  processes  are  zero  order. 

A  sequence  of  twenty  consecutive  purchases  from  each  household  was 

4 
tested  by  the  Wald-Wolfowitz  run  test   to  see  whether  the  sequence  could 

have  been  generated  by  a  stationary  Bernoulli  process.  By  stationary  it 

is  meant  that  the  household  has  a  constant  probability  of  purchasing 

brand  A  over  the  twenty  trials.   Frank  found  that  much  of  the  brand 

choice  behavior  was  consistent  with  the  Bernoulli  trials  hypothesis. 

However,  he  did  find  that  there  were  too  many  families  with  long  runs. 

This  result  may  be  due  to  one  of  the  following  two  factors; 

1.  the  process  generating  the  observations  is  non-Bernoulli j 

2.  the  process  is  basically  Bernoulli  in  the  short  run,  but  the 
Bernoulli  parameter  is  non-stationary  over  longer  purchase 
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sequences. 
To  the  extent  that  the  second  factor  is  operating,  we  might  erroneously 
reject  a  true  Bernoulli  process  which  is  non-stationary.   Morrison  (1965b) 
has  pointed  out  yet  another  pitfall  in  the  run  test  in  addition  to  the 
non-stationarity  issue.   He  points  out  that  for  sequences  of  twenty  re- 
sponses the  run  test  is  not  particularly  powerful.   Consequently  the  test 
may  accept  as  Bernoulli,  processes  which  are  rather  far  from  Bernoulli 
in  nature.   In  spite  of  these  limitations  in  statistical  procdure,  Frank's 
work  raises  an  important  issue--aggregation--with  respect  to  inferences 
concerning  the  order  of  consumers'  brand  choice  processes. 

Massy  and  Frank.   Using  the  techniques  of  factor  analysis  and  t simu- 
lation, Massy  and  Frank  (1964)  investigated  the  relationship  between  overt 
measures  of  consumers'  brand  and  store  switching  behavior  and  the  under- 
lying structure  of  these  switching  processes.   The  empirical  data  used 
in  this  study  were  drawn  from  the  family  by  family  purchase  records  of 
the  J.  Walter  Thompson  consumer  panel  for  the  period  between  July,  1956, 
and  June,  1957.   Three  product  categories  were  analyzed:   coffee,  tea,  and 
beer. 

For  each  product  class  and  within  product  class  by  family  they  de- 
veloped twenty-nine  raw  purchasing  statistics  such  as  number  of  brand 
runs,  number  of  store  runs,  average  length  of  brand  runs,  and  average 
length  of  store  runs.   These  statistics  were  then  factor  analyzed  by 
principal  components  in  order  to  attempt  to  identify  the  basic  dimensions 
of  loyalty  and  activity.   From  this  analysis  the  first  principal  com- 
ponent and  the  first  four  varimax  rotated  factors  were  identified 
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respectively  as: 

1.  General  loyalty  to  stores  and  brands 

2.  Activity 

3.  Brand  loyalty 

4.  Store  loyalty 

5.  Consistency  with  respect  to  second  and  third  favorite  brands. 
They  then  constructed  a  simulated  population  of  consumers  in  which 

each  actual  family  in  the  original  analysis  would  be  represented  by  a 
particular  simulated  family.   The  simulated  families  had  zero-order  brand 
and  store  switching  processes  with  quantity  purchased  determined  by  a 
Poisson  distribution  with  the  minimum  purchase  set  at  one  unit.   The 
parameters  of  each  simulated  family  were  estimated  from  the  purchase 
history  of  the  corresponding  actual  family. 

The  simulated  families  were  then  run  through  two  simulated  purchase 
histories:  one  having  the  same  number  of  purchases  as  the  corresponding 
actual  family  had  made  during  the  year,  the  other  having  three  times  the 
actual  number  of  purchases.  The  same  twenty-nine  raw  purchasing  statis- 
tics were  computed  for  each  simulated  family  in  these  runs  as  had  been 
done  for  the  actual  purchase  records.  These  statistics  were  then  factor 
analyzed  for  each  simulation  run.  The  factor  profiles  generated  by  the 
simulated  zero-order  families  were  then  compared  with  the  factor  profile 
from  the  actual  data. 

Comparison  of  the  actual  factor  profile  with  the  two  simulated  factor 
profiles  led  the  authors  to  conclude  that: 
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1.  For  coffee  and  tea,  brand  switching  behavior  is  not  distinguish- 
able from  a  zero-order  process. 

2.  For  beer,  brand  switching  seems  to  be  a  higher  order  process. 

3.  Store  switching  behavior  for  coffee,  tea,  and  beer  appears  to 
be  adequately  described  by  a  zero  order  process. 

While  it  is  not  clear  how  sensitive  the  factor  profiles  might  be  to 
deviations  from  the  Bernoulli  assumption,  the  method  used  in  this  study 
is  interesting  in  that  it  attempts  to  explore  the  manner  in  which  the 
underlying  structure  of  brand  and  store  switching  behavior  affects  com- 
mon summary  statistics  of  brand  and  store  choice  behavior.  Since  the 
statistics  are  generated  on  a  family-by-family  basis,  this  method  avoids 
the  aggregation  problem. 

Massy.  Massy  (1966)  has  examined  the  order  and  homogeneity  of  the 
brand  switching  process  for  specific  families.  The  data  base  for  this 
study  was  the  Chicago  Tribune  panel  purchase  records  for  regular  coffee 
between  January,  1956,  and  February,  1959. 

Since  he  wanted  to  study  the  family  specific  process,  it  was  neces- 
sary to  screen  the  families  for  purchase  frequency.   Of  the  original  sample 
of  800  families,  215  families  had  purchased  regular  coffee  sufficiently 
often  to  be  included  in  the  study.   Since  the  inference  methods  used  de- 
pend upon  stationarity  of  the  process  generating  the  observed  time  series, 
it  was  necessary  to  further  screen  the  families  for  stationarity.  Massy 
found  that  out  of  the  sample  of  215  families  that  had  passed  the  purchase 
frequency  test  only  39  could  also  meet  the  stationarity  requirement. 
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For  this  sample  of  39  frequent  and  stationary  purchasers  of  regular 
coffee  he  then  computed  family  specific  transition  matrices  as  well  as 
an  aggregate  transition  matrix  for  all  39  families  combined.   These  tran- 
sition matrices  had  three  states;   favorite  brand,  second  favorite  brand, 
all  other  brands.   Favorite  and  second  favorite  brands  were  defined  by 
purchase  frequencies  over  the  entire  data  period. 

Using  the  appropriate  Anderson  and  Goodman  (1957)  test  on  the  ag- 
gregate matrix,  he  found  that  the  null  hypothesis  of  a  zero  order  process 
could  be  rejected  in  favor  of  a  higher  order  process  at  the  99%  confi- 
dence level.   Thus  the  aggregate  switching  matrix  suggests  that  the 
process  is  of  higher  order  than  zero. 

Since  others  (Frank  1962  and  Morrison  1965b)  had  warned  of  the 

danger  of  inferring  the  order  of  a  process  at  the  individual  level  from 

results  based  upon  aggregate  transition  matrices,  Massy  then  sought  to 

infer  the  order  of  the  process  for  each  family  from  its  own  transition 

matrix.   Based  upon  this  disaggregative  data,  he  found  that  in  only  6 

out  of  the  39  cases  would  the  null  hypothesis  of  a  zero  order  process 

be  rejected  in  favor  of  a  higher  order  process  even  at  the  relatively 

loose  907,  confidence  level.   Recognizing  that  this  doesn't  necessarily 

establish  the  validity  of  a  zero  order  switching  process  for  regular 

coffee  consumption,  Massy  notes: 

...  if  we  had  all  the  data  in  the  world  we  would  be  surprised  if  the 
probabilities  of  purchasing  different  brands  are  serially  independent. 
The  real  question  at  issue  is  whether  the  departures  from  a  zero 
order  process  are  consistently  serious  enough  to  warrant  using  the 
more  complicated  first  order  model  to  describe  brand  switching  be- 
havior.  The  results  suggest  that  if  we  do  not  have  strong  a  priori 
views  about  the  matrix  for  a  given  family,  the  size  of  the  departure 
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from  a  condition  of  independent  trials  is  probably  small  relative 
to  the  sampling  errors  that  are  likely  to  be  obtained  when  we  esti- 
mate the  relevant  parameters. 

In  summary,  Massy' s  results  indicate  that  stationarity  is  the  ex- 
ception rather  than  the  rule,  that  consumers  differ  markedly  in  their 
brand  switching  transition  matrices,  that  inferences  concerning  the 
order  of  family  specific  processes  from  aggregate  transition  matrices 
are  extremely  sensitive  to  the  assumption  of  homogeneity  and  stationarity 
and  consequently  are  very  tenuous,  and,  finally,  that  for  regular  coffee 
a  zero  order  switching  model  seems  to  suffice.   This  latter  point  is 
consistent  with  Frank's  (1962)  results  for  regular  coffee  as  well  as 
the  previously  discussed  Massy  and  Frank  (1964)  results. 

Morrison.   Morrison  (1965b)  has  developed  statistical  tests  and 
estimation  procedures  for  heterogeneous  populations  of  consumers  whose 
brand  choice  behavior  in  the  short  run  may  be  described  by  Bernoulli 
trials.   He  assumes  that  each  consumer,  say  consumer  i,  has  some  proba- 
bility p  of  purchasing  brand  A  versus  all  other  brands  on  each  purchase 
occasion.   The  postulated  Bernoulli  process  for  each  individual  is 
assumed  to  be  stationary  in  the  short  run.   That  is, the  p  for  each 
individual  remains  constant  over  a  few  trials.   The  population  is  also 
assumed  to  be  heterogeneous,  which  in  this  case  means  that  p  is  dis- 
tributed across  the  population  of  consumers.   He  provides  statistical 
procedures  for  both  arbitrary  and  beta  distributions  of  p  . 

Since  his  Bernoulli  trials  formulation  turns  out  to  be  a  special 

case  of  one  of  his  Markov  models,  we  shall  defer  discussion  of  his 

empirical  results  until  the  section  on  Markov  models  of  consumer  brand 
choice. 
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1.2.4  Linear  Learning  Models 

Kuehn.   Kuehn  (1958  and  1962)  used  a  modified  form  of  Bush  and 
Hosteller's  linear  learning  theory  as  a  model  of  consumer  brand  choice. 
The  linear  learning  model  assumes  that  a  purchase  of  a  brand  will  increase 
its  probability  of  being  repurchased.   The  model  is  outlined  below. 

Suppose  we  have  a  two  brand  market.   One  of  these  two  brands  might 
well  be  a  brand  of  particular  interest  to  us--brand  A--while  the  other 
brand  may  simply  represent  all  other  brands.   In  the  two  brand  market 
it  is  sufficient  to  consider  P(A  )  =  P   ,  the  probability  of  purchasing 
brand  A  at  time  t,  where  time  indexes  purchase  events.  As  we  mentioned 
above,  the  linear  learning  model  assumes  that  the  actual  response  made 
at  time  t  affects  the  probability  of  purchase  at  time  t  +  1.   In  par- 
ticular, the  following  pair  of  linear  equations  summarize  this  assumption; 

p    =  a.  +  B.P  if  brand  A  was  purchased  at  time  t 

P  .=a  +BP  if  some  other  brand  was  purchased  at  time  t 
t+1    o    o  t 

Note  that  while  P    depends  upon  P  and  the  response  made  at  time  t,   P 
itself  summarizes  the  influence  of  past  purchases.   These  linear  equa- 
tions are  depicted  graphically  in  Figure  1-1. 

Suppose  we  take  a  consumer  who  has  probability  P  of  purchasing  brand 

A  at  time  zero.   That  is,  we  are  at  P  in  Figure  1-1.   Now  suppose  further 

o 

that  he  purchases  brand  A  at  time  zero.   Since  an  actual  purchase  of 
brand  A  is  assumed  to  enhance  his  probability  of  purchasing  brand  A  on 
the  next  trials,  we  find  the  probability  of  his  purchasing  brand  A  at 

time  zero  by  reading  from  p  up  to  the  purchase  operator  and  then  over 

p 
to  P.  .   The  superscript  denotes  a  purchase  of  brand  A  in  the  previous 
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trial.   Similarly,  if  he  purchased  some  other  brand  at  time  zero,  his 
probability  at  time  one  would  be  P-   as  read  from  the  rejection  operator, 

FIGURE  1-1 

KUEHN'S    LINEAR  LEARNING  MODEL 


t+1 


a. 


o 


PURCHASE  OPERATOR 
SLOPE  =  p. 


REJECTION  OPERATOR 
SLOPE  =  p 


In  Figure  1-1  we  note  that  P  and  P    are  bounded  by  L  and  U  . 

t      t+1  A      A 


That  is, 


and 


LA  S  Pt  *  UA 


L  <  P    <  U 

A  ~   t+1  -  A 


This  implies  that  learning  never  goes  to  completion.   That  is,  the  con- 
sumer is  never  absolutely  certain  to  purchase  one  brand  or  the  other. 
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In  most  applications  of  the  linear  learning  model  it  is  also  assumed 
that  p  =  p  =  P.  --i.e.,  that  the  slopes  of  the  purchase  and  rejection 
operators  are  equal.   In  this  case,  the  influence  of  past  purchases  on 
the  present  purchase  probability  are  exponentially  weighted  with  the 
most  recent  purchase  having  the  greatest  weight. 

Kuehn  has  not  directly  tested  or  estimated  the  linear  learning  model 
described  above  in  any  of  his  published  work,   in  his  thesis  (Kuehn,  1958) 
he  used  factorial  analysis  in  an  attempt  to  isolate  the  effects  of  past 
purchases.  These  results  and  others  do,  however,  exhibit  certain  char- 
acteristics which  are  consistent  with  the  learning  model  of  consumer 
brand  choice.  This,  of  course,  is  not  nearly  as  satisfying  or  con- 
vincing as  a  direct  test. 

Kuehn  also  found  an  exponential  decay  in  repurchase  probability  as 
the  time  between  purchases  increases.  This  result  was  not  corroborated 
by  the  work  of  Carman  and  Morrison  described  below.   This  may,  of  course, 
be  due  to  the  fact  that  Kuehn  used  frozen  orange  juice  data  while  Carman 
and  Morrison  used  dentifrice  and  coffee,  respectively. 

Haines.   Haines  (1964)  used  a  modified  form  of  the  linear  learning 

model  as  a  model  of  market  behavior  after  innovation.   The  modification 

involved  the  rejection  operator.   Since  the  model  is  to  be  of  a  market 

after  innovation,  it  was  felt  that  a  no  purchase  or  rejection  trial 

should  not  affect  the  probability  of  a  purchase  on  future  trials.  That 

is,  a  no  purchase  trial  with  respect  to  the  innovation  does  not  alter 

the  probability  of  a  purchase  on  future  trials. 

For  time  series  data  from  thirty-four  geographic  areas,  Haines  de- 
veloped aggregate  market  measures  in  terms  of  his  modified  learning 
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model.   The  measures  were  asymptotic  market  share  for  the  new  product 
and  the  rate  at  which  the  market  was  approaching  its  asymptotic  value. 
The  model  generally  was  a  good  fit  to  the  data  with  no  significance 
level  greater  than  0.10. 

Haines  then  attempted  to  estimate  the  rate  of  approach  to  equilibrium 
or  rate  of  approach  to  asymptote  and  the  asymptotic  market  share  as  a 
function  of  various  marketing  policies.  The  rate  of  approach  to  equi- 
librium was  found  to  relate  to  the  amount  spent  on  advertising  during 
the  first  two  months  and  to  the  prior  availability  of  the  good.   The 
equilibrium  market  share,  adjusted  for  the  population  in  each  region, 
related  to  the  per  capita  promotional  expenditures.  A  crucial  assumption 
in  Haines'  model  formulation  is  that  there  be  no  ready  substitutes  for 
the  innovation.  He  examines  the  potential  bias  which  can  exist  when 
this  assumption  is  violated. 

Haines'  basic  approach  is  a  sound  one.  He  used  a  stochastic  model 
of  the  dynamics  of  consumer  choice  behavior  to  estimate  behavior  within 
selected  geographic  segments.   His  model  provides  summary  estimates  of 
the  behavioral  dynamics  within  each  segment  as  well  as  measures  of 
the  "goodness  of  fit"  of  the  model  to  the  data.   He  then  uses  these 
interpretable  model  parameters  as  the  dependent  variables  in  regressions 
on  market  decision  variables.   This  strategy  should  receive  increasing 
attention  in  attempts  to  model  markets  and  to  measure  the  impact  of 
marketing  policy  variables. 

Carman.   Carman  (1965)  notes  that  Kuehn  has  published  little  em- 
pirical evidence  in  support  of  the  linear  learning  model  and,  further, 
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that  the  results  which  have  been  published  do  not  directly  test  the 
linear  learning  hypothesis.  Using  MRCA  panel  data  for  dentifrice  pur- 
chases in  the  period  immediately  subsequent  to  the  American  Dental  As- 
sociation's endorsement  of  Crest  in  August,  1960,  Carman  addressed 
himself  to  the  following  problems: 

1.  A  direct  test  of  the  linear  learning  model  in  a  product  class 
for  which  empirical  evidence  had  not  been  published  (i.e., 
dentifrice) . 

2.  A  test  of  Kuehn's  finding  of  a  decay  in  repurchase  proba- 
bility as  interpurchase  time  increases. 

3.  A  test  of  Frank's  hypothesis  of  spurious  learning  due  to  ag- 
gregation of  dissimilar  consumers. 

Carman  used  the  special  case  of  the  linear  learning  model  for  which 

the  "purchase  operator"  and  the  "rejection  operator"  have  the  same  slope. 

As  empirical  observations  for  his  estimation  procedure,  he  developed 

weighted  empirical  frequencies  of  the  probability  triplet  (P  ,  P    .t 

P    ,.)  where: 
r,t+l 

p 
t      =  probability  of  purchasing  Crest  at  time  t 

P    ,,   =  probability  of  purchasing  Crest  at  time  t+1  given  that  a 
purchase  of  Crest  was  made  at  time  t 

P    -   =  probability  of  purchasing  Crest  at  time  t+1  given  that  a 
'       purchase  of  some  other  brand  was  made  at  time  t  . 

These  empirical  frequencies  were  used  to  estimate  the  parameters  of 

the  purchase  and  rejection  operators  by  least  squares  regressions.   Carman 

derived  his  own  set  of  normal  equations,  but  he  could  just  as  well  have 

taken  advantage  of  the  fact  that  his  equal  slope  constraint  makes  a 
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dummy  variable  approach  feasible.  His  method  also  assumes  homogeneity 
among  consumers  in  terms  of  their  initial  probability  of  purchasing 

Crest.   That  is ,  P  is  assumed  the  same  for  all  consumers  in  one  run 

'  o 

of  the  model. 

Within  the  limitations  of  his  operational  definitions  and  methods, 
the  dentifrice  data  do  not  appear  to  be  inconsistent  with  the  results 
predicted  by  a  linear  learning  model.  The  coefficients  of  determination 
ranged  from  0.67  to  0.99. 

In  order  to  test  Kuehn's  time  decay  of  repurchase  probability  hy- 
pothesis, he  segmented  the  households  according  to  their  average  inter- 
purchase  time  in  the  dentifrice  market.  The  predicted  decay  in  repur- 
chase probability  as  interpurchase  time  increases  did  not  occur  in  the 
data.   This  finding  of  the  insignificance  of  the  time  decay  is  con- 
sistent with  results  obtained  by  Morrison  (1965b)  for  regular  coffee. 

To  test  the  Frank  hypothesis,  Carman  identified  a  group  of  "switchers" 
defined  by  their  brand  purchase  sequence  on  the  three  purchases  prior 
to  the  period  under  study.   For  this  group  of  switchers,  he  concludes 
that  the  apparent  learning  could  not  all  have  been  caused  by  over- 
aggregation.   He  bases  this  conclusion  upon  results  of  a  regression 
and  a  scatter  diagram  presented  as  Figure  3,  p.  21  of  Carman  (1965). 
From  the  scatter  of  points  in  the  diagram  it  is  difficult  to  see  how 
a  coefficient  of  determination  of  0.91  was  achieved. 

Carman  also  considered  the  projected  equilibrium  brand  share  for 
Crest.   For  the  period  immediately  following  the  endorsement,  the  brand 
share  projections  were  approximately  727.  for  all  subgroups.   For  later. 
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periods  these  projections  dropped  to  more  reasonable  levels.  This  sug- 
gests that  the  learning  process  itself--if  it  exists  —  is  non-stationary. 

That  is,  a  *  ct  ,  and  p  change  over  time,  presumably  under  the  impact  of 
R   P 

competitive  reaction  to  the  Crest  success. 

Massy.   In  all  of  the  above  applications  of  the  linear  learning 
model,  it  is  assumed  that  all  respondents  start  with  the  same  proba- 
bility of  making  response  1  versus  response  0,   Carman,  of  course,  disag- 
gregated into  somewhat  more  homogeneous  groups  here.   In  any  case  this 
appears  to  be  an  artificial  assumption.  Massy  (1965)  has  developed  pro- 
cedures for  estimating  the  linear  learning  model  when  the  initial  re- 
sponse probability  is  distributed  across  the  population  of  respondents. 
He  presents  a  minimum  chi  square  as  well  as  an  approximate  regression 
procedure.  The  former  is  computationally  burdensome,  yet  it  yields  a 
single  overall  measure  of  the  linear  learning  hypothesis  while  at  the 
same  time  providing  best  asymptotically  normal  estimates  of  the  pa- 
rameters.  In  this  case  the  asymptotic  results  are  achieved  as  the 
number  of  respondents  or  households  entering  the  analysis  is  increased. 
The  approximate  least  squares  procedure  leads  to  difficulties  in  the 
interpretation  of  the  properties  of  the  estimates.   This  paper  makes  a 
substantial  contribution  on  two  fronts:   (1)  its  content  and  results, 
(2)  its  public  presentation  of  estimation  procedures  for  the  linear 
learning  model.   The  latter  point  has  been  remarkable  in  its  absence 
in  the  past. 
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1.2.5  Markov  Models 

Harary  and  Llpstein.   Harary  and  Lipstein  (1961)  found  it  useful  to 
disaggregate  the  overall  transition  matrix  into  a  hard  core  matrix  and 
a  switchers  matrix.  This  procedure  was  also  used  earlier  by  Lipstein 
(1959).   The  hard  core  matrix  was  composed  of  all  those  households 
devoting  three-quarters  or  more  of  their  purchases  in  the  product  class 
to  one  particular  brand.  Their  empirical  experience  has  shown  that 
products  seem  to  have  from  50%  to  80%  hard  core  consumers. 

Three  major  types  of  dynamic  market  predictions  were  made  using 
the  Markov  chain  analysis.  The  first  was  the  steady-state  or  equilibrium 
market  shares.  The  equilibrium  market  shares  are  the  shares  the  brands 
would  have  if  the  process  of  brand  switching  as  currently  estimated  by 
the  transition  matrix  were  to  be  allowed  to  run  to  equilibrium.  Thus 
these  equilibrium  shares  provide  a  useful  measure  of  the  direction  in 
which  the  market  is  heading,   in  any  real  market  situation  the  equi- 
librium share  is  rarely  reached  because  competitive  activity  tends  to 
alter  the  values  of  the  transition  probabilities  which  constitute  the 
transition  matrix.  The  second  prediction  is  the  average  time  to  trial. 
This  yields  information  as  to  the  average  number  of  periods  which  will 
pass  before  a  consumer  will  try  a  particular  brand  in  this  product  class. 
This  is  a  measure  of  the  attractive  power  of  the  brand.  Finally  they 
suggest  using  the  Markov  chain  model  to  evaluate  the  success  of  a  new 
product  introduction.   The  model  is  used  to  describe  the  evolution  of 
brand  shares,  new  triers,  repeat  buying  rates,  and  the  proportion  of 
hard  core  buyers. 
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Howard .  Howard  (1963)  objects  to  the  use  of  the  somewhat  artificial 
"no  purchase"  state.   He  advocates  the  use  of  semi-Markov  chains  as 
stochastic  models  of  consumer  brand  switching  behavior.   In  a  semi- 
Markov  process  model  the  time  between  purchases  is  no  longer  fixed  but 
is  itself  stochastic.   In  general ,  this  waiting  time  distribution  and 
the  Markov  transition  probabilities  need  not  be  independent. 

In  addition,  Howard  also  considered  problems  of  aggregation  in 
Markov  chain  models.   He  was  the  first  to  point  out  the  distinction  be- 
tween a  flow  model  approach  and  a  true  stochastic  approach  in  appli- 
cations of  Markov  chain  models  to  consumer  brand  switching. 

Styan  and  Smith.  Styan  and  Smith  (1964)  used  Markov  chains  to  ana- 
lyze product  switching  behavior  for  a  panel  of  British  housewives.  For 
the  twenty-six  week  period  between  January  and  June,  1957,  each  house- 
wife's purchase  behavior  in  the  laundry  powder  market  was  classified 
into  one  of  the  following  four  mutually  exclusive  and  collectively 
exhaustive  categories: 

1.  Bought  detergent  only 

2.  Bought  soap  powder  only 

3.  Bought  both  detergent  and  soap  powder 

4.  Bought  no  laundry  powder  at  all 

These  categories  define  the  "state  space"  of  the  Markov  chain  analysis. 

The  twenty-six  week  period  enabled  them  to  compute  twenty-five  two- 
period  transition  matrices  for  aggregate  switching  behavior.   That  is, 

a  transition  matrix  was  computed  for  times  t-1  versus  t  for  t=l,  2,  .„„, 
25.   It  should  be  emphasized  that  for  any  two-period  transition  matrix, 
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the  data  for  all  households  was  aggregated. 

^2 
Using  the  ]X  test  developed  by  Anderson  and  Goodman  (1957),  they 

tested  the  order  of  the  Markov  chain.   The  null  hypothesis  and  the 

alternative  were: 

H-:   the  data  are  from  a  zero  order  Markov  chain 

h  :   the  data  are  from  a  first  order  chain  . 

Each  of  the  twenty-five  matrices  was  tested  for  H^  versus  H  „   It 
was  found  that  IL.  could  be  rejected  in  favor  of  H.  at  a  very  high  level 
of  significance.   Thus  the  aggregate  data  behaved  according  to  a  higher 
than  zero  order  Markov  chain.   In  this  case  there  was  not  sufficient 
data  to  test  the  first  order  hypothesis  against  second  and  higher  order 
alternatives.   Note,  however,  that  while  this  aggregate  first  order  be- 
havior may  be  useful  from  a  marketing  standpoint,  it  does  not  establish 
first  order  Markov  behavior  on  the  part  of  the  individual  households 
who  enter  this  analysis.   Recall  Massy's  (1966)  finding  that  aggrega- 
tion lead  to  highly  significant  inferences  of  a  first  order  process 
whereas  disaggregation  for  the  same  sample  of  households  tended  to  sup- 
port the  notion  of  zero  order  or  independent  trials  behavior.   Frank 
(1962)  and  Morrison  (1965b)  have  also  warned  of  the  danger  of  inferring 
individual  behavior  based  upon  aggregate  transition  matrices  when  the 
models  used  assume  homogeneity  of  the  individuals  in  the  sample. 

The  stationarity  of  the  transition  matrices  over  the  twenty-six 
week  period  was  also  tested.   In  this  case,  the  null  hypothesis  of 
stationarity  could  not  be  rejected  since  the  significance  level  of  the 
test  was  over  24%.   Hence,  the  aggregate  data  appear  to  be  consistent 
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with  a  stationary,  first  order  Markov  chain. 

They  also  discuss  the  use  of  the  limiting  distribution  of  market 
share  for  the  four  alternatives.   It  was  found  that  the  market  shares 
for  these  twenty-six  weeks  did  not  vary  much  from  the  equilibrium  market 
shares  predicted  by  the  transition  matrix  formed  by  aggregating  over 
all  twenty-five  of  the  two-period  matrices.   Hence  this  market  was  very 
close  to  its  steady-state  condition.   Perhaps  the  most  refreshing 
aspect  of  this  paper  is  the  fact  that  the  authors  presented  their  data 
and  tested  the  assumptions  of  the  aggregate  Markov  model  they  applied. 

Morrison.  Morrison's  development  of  methods  for  estimating  and 
testing  a  short  run  stationary  Bernoulli  population  of  consumers  was 
presented  in  the  discussion  of  Bernoulli  models.   In  the  same  study 
Morrison  (1965b)  also  developed  two  interesting  new  Markov  models.   In 
previous  applications  of  Markov  models  it  was  always  assumed  that  each 
and  every  consumer  had  the  same  brand  switching  matrix.   Even  when  the 
total  switching  matrix  is  disaggregated  into  a  hard  core  and  a  switchers 
matrix,  previous  applications  have  assumed  identical  switching  matrices 
for  individuals  within  these  disaggregated  segments.   Morrison  has 
generalized  these  previous  models  to  include  the  case  where  individuals 
may  be  heterogeneous  with  respect  to  their  brand  switching  matrices. 
These  models  are  presented  below  together  with  Morrison's  initial  em- 
pirical findings. 

Morrison's  models  have  been  developed  for  the  binary  choice  case. 
In  terms  of  a  consumer  product  market  the  choices  might  be  our  brand 
versus  all  others.  As  time  goes  on,  consumers  make  purchases  in  the 
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product  class  in  question.  Suppose  we  then  code  each  of  these  response 
sequences  into  a  vector  of  O's  and  l's  where  a  1  denotes  a  purchase  of 
our  brand  while  a  0  denotes  a  purchase  of  some  other  brand.   In  the  em- 
pirical work  summarized  below  Morrison  let  a  1  denote  a  purchase  of  the 
family's  favorite  (most  frequently  purchased)  brand, 

Morrison's  first  order  Markov  models  were  termed  the  Brand  Loyal 
Model  and  the  Last  Purchase  Loyal  Model.  These  models  are  defined  below. 
Brand  Loyal  Model. 

For  the  Brand  Loyal  Model  the  consumers  are  postulated  to  be  first 
order  0-1  processes  having  transition  matrices 


purchase  t  +  1 

1                0 

1 

P            1-P 

0 

kp            1-kp 

purchase  t 


where  k  is  a  constant  which  is  the  same  for  all  families  and  lies  be- 
tween 0  and  1  and,  in  addition,  p  is  beta  distributed  across  the  popu- 
lation of  consumers. 

We  see  from  the  transition  matrix  that  the  population  is  hetero- 
geneous with  respect  to  its  first  order  transition  matrices  since  p  is 
distributed  across  the  population.   The  designation  of  this  model  as 
the  "Brand  Loyal  Model"  denotes  the  fact  that  if  p  is  high  the  consumer 
is  very  likely  to  repurchase  brand  1,  is  unlikely  to  switch  to  brand  0, 
and  is  quite  likely  to  switch  from  brand  0  to  brand  1.   Conversely,  a 
low  p  means  that  the  consumer  is  relatively  likely  to  repurchase  brand 
0,  is  not  likely  to  switch  from  brand  0  to  brand  1,  but  is  likely  to 
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switch  from  brand  1  to  brand  0.  Thus  we  see  in  this  Brand  Loyal  Model 
that  p  measures  the  propensity  to  repurchase  or  switch  to  a  particular 
brand.  It  should  also  be  noted  that  if  k=l  the  transition  matrix  then 
becomes 

purchase  t  +  1 
purchase  t        1         0 


1 

0 


1-p 
1-p 


But  this  is  just  the  transition  matrix  of  a  zero  order  or  Bernoulli  pro- 
cess.  Since  p  is  beta  distributed,  we  have  a  beta-Bernoulli  consumer 
population  when  k=l.   Thus  k  is  an  index  of  the  "Bernoulliness"  of  the 
population,  to  use  a  term  coined  by  Morrison. 
Last  Purchase  Loyal  Model. 

In  the  Last  Purchase  Loyal  Model  high  values  of  p  result  in  a  high 
propensity  to  repurchase  the  brand  last  purchased.   Each  family  is 
assumed  to  be  a  0-1  first  order  process  with  transition  matrix 


purchase  t 


purchase  t  +  1 

1                       0 

1 

P                  1-P 

0 

1 - kp              kp 

where  k  and  p  are  the  same  as  in  the  Brand  Loyal  Model. 

In  his  study  Morrison  develops  a  minimum  chi  square  procedure  for 
estimating  and  testing  these  models  as  well  as  his  short  run  stationary, 


heterogeneous  Bernoulli  model  with  arbitrary  contagion.    Such  a  procedure 
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yields  best  asymptotically  normal  estimates  of  the  parameters  as  well  as 
a  chi  square  statistic  which  measures  the  overall  "goodness  of  fit"  of 
the  model  to  the  data.   It  should  also  be  noted  that  he  has  developed 
the  power  of  the  chi  square  tests  against  specific  alternatives. 

Morrison  then  turned  his  attention  to  an  initial  empirical  com- 
parison of  these  models.   Using  Chicago  Tribune  panel  data  from  January, 
1956  through  February,  1959,  he  fitted  each  of  his  models  to  several 
alternative  segmentations  of  the  market.   He  found  that  the  Brand  Loyal 
Model  yielded  the  best  fits  as  measured  by  the  chi  square  statistics. 
Since  each  of  these  models  has  a  different  number  of  degrees  of  freedom, 
one  should  compare  their  respective  chi  square  probability  levels  rather 
than  the  actual  values.  Although  he  makes  this  error  in  his  initial 
comparison  of  the  models,  the  difference  in  degrees  of  freedom  is  not 
sufficient  to  change  the  conclusions  in  the  present  case.  A  far  more 
serious  problem  in  his  empirical  results  is  his  use  of  overlapping  re- 
sponse sequences.   This  procedure,  while  perhaps  not  invalidating  the 
comparison  between  his  models,  nevertheless  is  likely  to  bias  his  em- 
pirical chi  square  statistics.   Thus  we  must  be  careful  as  to  what  we 
conclude  regarding  the  "goodness  of  fit"  of  the  Brand  Loyal  Model  in 
an  absolute  sense. 

Further,  he  found  that  consumers  were  not  too  far  from  "Bernoulli- 
ness"  in  their  brand  choice  behavior  as  measured  by  k  in  the  Brand  Loyal 
Model.   In  another  portion  of  his  empirical  work  he  found  that  for  the 
coffee  data,  time  between  purchases  does  not  have  a  significant  effect 
on  brand  loyalty.   This  is  in  contrast  to  Kuehn's  claims  to  the  contrary. 
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In  sum,  Morrison  has  made  a  very  substantial  contribution  to  the 
study  of  consumer  brand  choice  behavior.   His  methods  and  models  repre- 
sent a  significant  advance  in  stochastic  models  of  consumer  behavior. 

1.3  Behavioral  Science  Models  of  Response 

Nothing  so  ambitious  as  a  complete  review  of  behavioral  science 
models  of  response  is  presented  here.   Rather,  in  this  section  we  shall 
consider  two  model  types  of  direct  relevance  to  the  models  developed  in 
this  report.   The  model  types  are  stimulus  sampling  models  from  mathe- 
matical learning  theory  and  the  latent  Markov  models  developed  by  James 
S.  Coleman. 

1.3.1  Stimulus  Sampling  Models 

8 
Stimulus  sampling  theory  views  behavior  as  being  elicited  by  stimu- 
lus events  that  are  associated  with  or  conditioned  to  the  various  re- 
sponses which  may  be  made  in  the  situation.   The  basic  terms  in  stimu- 
lus sampling  models  are:   stimulus,  response,  association,  and  rein- 
forcement. A  given  behavioral  situation  is  postulated  to  present  the 
respondent  with  a  total  stimulus  which  is  composed  of  hypothetical  stimu- 
lus elements.   These  stimulus  elements  are  further  postulated  to  each  be 
uniquely  associated  with  or  conditioned  to  one  of  the  mutually  exclusive 
and  collectively  exhaustive  response  alternatives.  When  presented  with 
the  stimulus  situation,  the  respondent  is  assumed  to  sample  one  of  the 
stimulus  elements  randomly.   He  then  makes  that  response  to  which  the 
sampled  stimulus  element  is  associated.   Change  in  the  association  of 
the  stimulus  elements  depends  upon  reinforcement.   Once  the  sampled 
stimulus  has  elicited  its  associated  response,  the  theory  postulates  a 


-27- 

reinforcement  mechanism.   That  is,  the  response  which  was  made  is  either 
rewarding  or  not  rewarding.   If  the  response  was  rewarding,  the  sampled 
stimulus  element  remains  associated  with  that  response.   If  the  re- 
sponse was  not  rewarding,  the  sampled  stimulus  element  changes  its 
response  association--!. e. ,  becomes  associated  with  some  alternative 
response.   Much  work  has  been  done  on  the  non-contingent  reinforcement 
case.   Non-contingent  reinforcement  means  that  the  probability  that  the 
current  response  will  be  rewarded  is  independent  of  which  particular 
response  has  been  made. 

The  theory  remains  somewhat  vague  as  to  the  actual  meaning  of  the 
hypothetical  stimulus  elements.   In  their  recent  text  on  mathematical 
learning  theory  Atkinson,  Bower,  and  Crothers  (1965,  p.  346)  make  the 
following  comment  on  this  situation; 

...  although  stimulus  sampling  theory  has  a  very  precise  way  of 
representing  the  stimulus  situation,  the  theory  is  vague,  in  fact 
noncommitted  concerning  what  is  a  stimulus.   Paradoxically,  the 
flexibility  and  scope  of  the  theory  arises  in  part  because  of  this 
lack  of  committment  regarding  how  stimulus  elements  are  to  be  defined, 

Thus  stimulus  sampling  models  utilize  the  stimulus  element  construct  in 
order  to  generate  dynamic  response  behavior  without  necessarily  estab- 
lishing an  isomorphism  between  the  hypothetical  elements  and  separable 
portions  of  the  real  world  stimulus  situation.   It  will  be  seen  in 
Chapter  II  that  we  make  use  of  an  analogous  concept  in  order  to  generate 
dynamic  brand  choice  behavior. 

Stimulus  sampling  models  are  generally  derived  from  a  precisely 

stated  system  of  axioms.  We  state  such  a  system  below  for  the  N-Element 

9 
Pattern  Model  under  non-contingent  reinforcement.   The  axioms  are: 
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Representation  Axioms 

Rl.   There  are  N  stimulus  elements,  labeled  S,,  S  ,  . ..,  S 

12        n 

R2.   There  are  r  responses,  labeled  A.,  A  ,  ...,  A  .  One  of 
these  responses  occurs  on  each  trial. 

R3.   There  are  r  +  1  reinforcing  events,  labeled  En,  E  ,  ...,  E  . 
One  of  these  reinforcing  events  occurs  on  eacn  trial  .   (In 
the  following,  the  phrase  "response  A.  is  reinforced"  should 
be  interpreted  "reinforcing  event  E,  occurs.") 

Conditioning  Axioms 

Cl.  At  the  start  of  a  trial,  each  element  is  conditioned  to 
exactly  one  response. 

C2.   If  an  element  is  sampled  on  a  trial,  it  becomes  conditioned 
with  probability  c  to  the  response  (if  any)  that  is  rein- 
forced on  that  trial*  with  probability  1  -c,  it  remains  con- 
ditioned as  before.   If  the  sampled  element  is  already  con- 
ditioned to  the  reinforced  response,  it  remains  so. 

C3.   If  event  En  occurs  on  a  trial,  there  is  no  change  in  con- 
ditioning of  the  sampled  element. 

C4.   Stimulus  elements  that  are  not  sampled  on  a  trial  do  not 
change  their  state  of  conditioning  on  that  trial. 

C5.   The  probability,  c,  that  a  sampled  element  will  become  con- 
ditioned to  the  reinforced  response  is  independent  of  the 
trial  number  and  of  events  on  preceding  trials. 

Stimulus  Sampling  Axioms 

Si.   Exactly  one  of  the  N  elements  is  sampled  on  each  trial. 

S2.   The  probability  of  sampling  any  particular  element  may  be 
a  function  of  the  trial  number  and  the  preceding  events. 
If  h  denotes  the  sequence  of  stimuli,  responses,  and  re- 
inforcing events  that  occurred  up  to  and  including  trial 
n-1,  then  the  probability  of  sampling  element  i  on  trial 

n  is  some  function  of  h  ,  denoted  a    (h  ).   (By  Axiom  Si. 

n  in 

N 

£  o<h)  =  1 

i=l  l 

for  every  value  of  n.) 
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Respcmse  Axiom 

On  each  trial  that  response  is  made  to  which  the  sampled  ele- 
ment is  conditioned. 

In  Chapter  II  we  formalize  the  General  Latent  Markov  Model  by  presenting 
its  foundations  in  terms  of  a  system  of  axioms  similar  in  form  to  those 
given  above  for  the  N-Element  Pattern  Model  under  non-contingent  rein- 
forcement conditions. 

If  there  are  only  two  possible  responses  in  the  N-Element  Pattern 
Model,  the  equilibrium  or  steady-state  distribution  of  response  associa- 
tions of  the  N  elements  is  a  binomial  distribution.   It  is  interesting 
that  the  same  equilibrium  distribution  is  obtained  for  the  Independent 
Elements  Model  which  is  developed  in  Section  2.3  as  a  special  case  of 
the  General  Latent  Markov  Model. 

1.3.2  Coleman's  Latent  Markov  Models 

The  present  report  is  a  clarification,  extension,  and  empirical  test 
of  a  class  of  stochastic  models  first  proposed  by  James  S.  Coleman. 
Since  the  remainder  of  this  report  deals  with  the  development,  estima- 
tion, and  test  of  these  latent  Markov  models,  the  models  will  not  be 
reviewed  here.  Explicit  discussion  of  them  will  be  deferred  until  it 
arises  naturally  in  later  chapters.  The  purpose  of  the  present  section 
is  to  summarize  very  briefly  the  accomplishments  and  limitations  of 
Coleman's  work  with  these  models.  A  full  appreciation  of  the  scope  of 
his  achievements  requires  that  the  reader  cpnsult  Coleman's  own  work. 

From  the  viewpoint  of  the  present  author,  Coleman's  achievements 
in  his  work  with  latent  Markov  models  may  be  summarized  as  follows: 
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1.  he  provided  the  initial  formulations  of  this  class  of  sto- 
chastic models, 

2.  he  has  derived  the  complex  steady-state  distributions  of  cer- 
tain latent  Markov  models, 

3.  he  has  given  a  suggestive,  albeit  unproven,  aggregation  pro- 
cedure, and 

4.  he  has  made  initial  application  of  the  models  to  such  areas 
as  group  voting  behavior  and  consumer  brand  choice. 

These  have  been  major  contributions  to  the  development  of  latent  Markov 
models . 

In  addition  to  noting  Coleman's  accomplishments  in  the  development 
of  latent  Markov  models,  it  is  also  necessary  to  consider  the  limitations 
of  the  work  which  has  been  done  to  date.   There  are  two  principal  criti- 
cisms which  may  be  made: 

1.  Coleman  nowhere  applies  a  statistical  test  of  whether  or  not 
his  models  are  consistent  with  a  set  of  data, 

2.  he  ignores  questions  of  the  properties  of  his  parameter  esti- 
mates and  seems  content,  for  the  most  part  to  use  estimates 
derived  under  conditions  which  just  identify  the  parameters. 

The  fact  that  he  simply  assumes  his  models  to  be  true  rather  than  pro- 
viding a  test  of  this  assumption  and  the  fact  that  he  nearly  always  uses 
just  identified  parameter  estimates  must  temper  our  willingness  to 
generalize  and  draw  conclusions  from  his  empirical  applications. 

1.4  Overview 

The  remaining  chapters  are  sketched  in  brief  below; 

Chapter  II.   In  this  chapter  we  present  some  general  concepts,  the 
notion  of  a  diffusion  limit,  and  market  measures  from  latent  Markov  models. 
The  General  Latent  Markov  Model  is  then  formulated  based  upon  a  system 
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of  axioms.   Finally,  the  special  case  of  the  general  model  known  as 

the  Independent  Elements  Model  is  presented.   This  model  is  shown  to 

be  unsatisfactory  as  a  representation  of  consumer  brand  choice  behavior. 

Chapter  III.   This  chapter  presents  a  model  which  is  a  viable  repre- 
sentation of  consumer  behavior.   The  basic  formulation  and  the  steady- 
state  distribution  is  due  to  Coleman.   Using  Coleman's  foundation  we 
first  show  that  the  steady-state  distribution  of  the  model  is  asymp- 
totically a  beta  distribution.  We  then  show  that  the  process  is  a 
viable  response  model  by  deriving  its  mean  value  function  and  its 
variance  at  any  given  time.   The  diffusion  limit  of  the  process  is  then 
discussed  and  the  asymptotic  form  of  the  steady-state  distribution  is 
shown  to  satisfy  the  necessary  form  of  the  Fokker-Plank  diffusion  equa- 
tion. A  proof  of  an  aggregation  procedure  and  the  formulation  of  pre- 
liminary estimating  equations  follows.   Finally  a  procedure  for  esti- 
mating the  first  two  moments  of  the  cross  sectional  distribution  of 
response  probability  is  presented. 

Chapter  IV.   In  this  chapter  we  derive  and  analyze  two  alternative 
estimation  procedures  for  the  model  developed  in  Chapter  III.   The  pros 
and  cons  of  the  recursive  regression  method  and  the  minimum  chi  square 
procedure  are  discussed. 

Chapter  V.   The  model  developed  in  Chapter  III  is  formally  tested 
by  the  methods  derived  in  Chapter  IV.   It  is  found  that  the  model  appears 
to  be  a  viable  model  of  empirical  consumer  brand  choice  behavior. 

Chapter  VI.   This  chapter  presents  a  summary  of  what  has  been 
achieved  in  this  report  and  some  suggestions  for  future  research. 
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Foot  notes 


1.  See  Buzzell  (1964),  Bass  et  al.  (1961),  and  Frank,  Kuehn,  and  Massy 
(1962). 

2.  See  Bass  et  al.  (1961). 

3.  The  new  product  failure  rate  has  been  variously  estimated  at  70-90% 
of  all  new  products  which  are  actually  introduced  to  the  market. 

4.  See  Siegel  (1956,  pp.  136-145). 

5.  See  the  section  below  on  Massy's  methods  for  estimating  and  testing 
the  linear  learning  model. 

6.  While  this  assumption  appears  to  be  reasonable  prior  to  the  first 
purchase  of  a  consumer  non-durable  innovation,  it  seems  less 
reasonable  once  the  innovation  has,  in  fact,  been  tried.   Perhaps 
the  rejection  operator  should  be  divided  into  two  conditional 
operators.   Haines'  operator  would  be  applied  to  no  purchase 
trials  priorrto  the  first  trial,  while  the  standard  form  of  the 
rejection  operator  would  be  applied  once  an  initial  purchase 

has  been  made.   It  should  be  noted  that  this  more  realistic  ap- 
proach will  complicate  the  mathematics  of  the  process,  perhaps 
even  rendering  the  model  intractable. 

7.  The  present  author  prefers  the  use  of  the  term  heterogeneity  to 
Morrison's  use  of  contagion. 

8.  See  Atkinson,  Bower,  and  Crothers  (1965,  Chpt.  8)  and  Atkinson 
and  Estes  (1963)  for  formal  development  and  discussion  of  stimulus 
sampling  theory. 


9.  Atkinson,  Bower,  and  Crothers  (1965,  p.  353). 
10.   See  especially  Coleman  (1964  a  and  b) . 


Chapter  II 
LATENT  MARKOV  MODELS  OF  CONSUMER  BRAND  CHOICE 

In  the  present  chapter  we  introduce  latent  Markov  models  of  choice 
behavior.   While  the  models  are  applicable  to  many  types  of  choice  situa- 
tions, the  primary  concern  in  this  study  is  choice  behavior  as  it  re- 
lates to  the  consumer's  selection  of  alternative  brands  in  consumer  pro- 
duct markets.  We  will  first  present  a  general  discussion  of  latent 
Markov  models.  Attention  will  also  be  given  to  measures  of  interest 
to  marketing  managers  and  market  researchers  which  may  be  obtained  from 
these  models.   Once  this  general  discussion  is  concluded,  the  formal  as- 
sumptions of  latent  Markov  models  will  be  specified  as  a  series  of  axioms 
Finally,  the  chapter  will  conclude  with  a  discussion  of  a  particular 
model  from  the  general  class  of  latent  Markov  models.   It  will  be  seen 
that  this  model  is  not  a  satisfactory  representation  of  consumer  choice 
behavior. 

2.1   Introduction  to  Latent  Markov  Models 

2.1.1  General  Discussion 

In  this  report  we  shall  confine  our  attention  to  latent  Markov 
models  of  the  binary  response  situation.   See  Chapter  VI  for  suggested 
extensions  to  the  N-chotomous  response  case. 

To  set  the  stage  for  what  follows  it  is  useful  to  discuss  certain 
terms.   In  particular,  we  shall  at  this  point  explain  what  is  meant  by 
response,  response  probability,  the  state  space  of  the  latent  Markov 
process,  choice  or  response  occasion,  and  response  elements.   The  ex- 
planations follow; 
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Response.   In  the  binary  choice  model  there  are  two  mutually  ex- 
clusive and  collectively  exhaustive  responses,  say  A  and  B.   Examples 
might  be  intention  to  vote  democratic  versus  intention  to  vote  republi- 
can or  purchase  of  brand  A  versus  brand  B  in  some  product  class. 

Response  occasion.  A  response  occasion  is  the  event  that  the 
responding  organism  ha6  made  one  of  the  two  alternative  responses.   For 
example,  in  the  brand  choice  case  a  response  occasion  would  represent 
the  purchase  of  one  of  the  two  brands. 

Response  probability.   This  represents  the  probability  that  a  par- 
ticular response  will  be  elicited  on  any  given  response  occasion.   Note 
that  this  probability  is  conditional  upon  a  response  being  made.   In 
the  binary  response  case  it  is  sufficient  to  consider  just  P(A),  the 
probability  of  making  response  A,  since  P(B)  =  l-P(A).   As  a  proba- 
bility measure,  P(A)  naturally  has  the  bounds 

0  <  P(A)  <  1. 

State-space  of  the  latent  Markov  process.   The  latent  Markov  pro- 
cess which  is  developed  in  later  sections  operates  on  the  response 
probabilities.  Thus  the  state-space  of  the  latent  Markov  process  is 
the  set  of  all  possible  values  of  the  response  probability.   There  may 
be  a  finite  number  of  states  or  an  infinite  number.   The  latter  case 
occurs  when  we  pass  to  the  diffusion  limit  of  a  model.   See  Section  2.1.2 
for  an  introduction  to  the  diffusion  limit  and  see  Section  3.4  for  the 
development  of  the  diffusion  limit  of  the  Cohesive  Elements  Model,  which 
is  the  model  presented  in  Chapter  III. 

Response  elements.   These  are  hypothetical  constructs  which  are 
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used  to  provide  a  conceptual  framework  for  the  derivation  of  the  dynamic 
properties  of  the  latent  Markov  models.  These  response  elements  are 
conceptually  similar  to  the  stimulus  elements  of  stimulus  sampling 
theory  and  are  used  in  an  analogous  manner.  As  in  the  stimulus  sampling 
models  discussed  in  Section  1.3,  there  is  no  need  in  the  present  case 
to  achieve  an  isomorphism  between  these  hypothetical  response  elements 
and  any  overt  real  world  phenomenon.   They  simply  represent  a  useful 
conceptual  construct. 

Each  response  element  is  assumed  to  form  an  association  with  one 
of  the  two  alternative  responses.   Each  individual  respondent  in  the 
population  is  assumed  to  have  a  set  of  N  of  these  response  elements. 
Note  that  N  may  be  finite  or  infinite.  An  individual  respondent's 
probability  of  making  response  A  on  any  response  occasion  is  determined 
by  the  proportion  of  his  response  elements  which  are  associated  with 
response  A. 

An  individual's  response  probability  is  not,  however,  static. 
Since  we  further  postulate  a  process  whereby  the  response  elements  may 
change  their  association,  the  response  probability  changes  through  time. 
The  change  process  is  postulated  to  be  a  continuous  time  stochastic 
process  on  the  response  elements.   Thus  changes  in  response  probability 
may  occur  at  any  point  in  time. 

2.1.2  Probability  Diffusion  Model  Viewpoint 

The  reader  has  probably  noticed  by  now  that  we  have  continually 
spoken  of  our  models  as  latent  Markov  models  in  contrast  to  the  title 
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of  this  report  which  is  A  Probability  Diffusion  Model  of  Dynamic  Market 
Behavior.   In  this  section  we  shall  attempt  to  clarify  this  apparent 
inconsistency. 

Let  us  suppose  for  the  moment  that  we  are  interested  in  the  binary 
choice  case.  Let  us  denote  the  two  mutually  exclusive  and  collectively 
exhaustive  response  or  choice  alternatives  as  A  and  B.  It  will  be  suf- 
ficient to  consider  just  P(A).  Now  further  suppose  that  P(A)  undergoes 
change  at  discrete  random  intervals  of  time  and  that  at  each  change 
point  it  changes  by  some  discrete  amount  1/N.  Hence  P(A)  is  a  discrete 
random  variable.  The  time  path  of  P(A)  might  appear  as  in  Figure  2-1. 

FIGURE  2-1 
A  RANDOM  WALK  EXAMPLE 
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Since  the  change  in  P(A)  is  discrete  at  each  change  event,  the  progress 
of  P(A)  through  time  is  a  random  walk,,   Now  if  we  let  N  — >  oo    and  hence 

1/N  — >  0  at  the  same  time  that  the  time  between  change  events  ^  0> 

the  random  walk  on  P(A)  described  above  passes  into  a  diffusion  process. 
See  Feller  (1957)  for  a  more  detailed  account  of  this  type  of  passage. 
Now  the  time  path  of  P(A)  might  appear  as  in  Figure  2-2. 


1.0 


P(A) 
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FIGURE  2-2 
A  DIFFUSION  EXAMPLE 


time 


The  model  now  has  a  continuous  state  space  and  operates  in  continuous 
time. 

The  diffusion  limit  discussed  above  corresponds  to  the  case  where 
we  let  the  number  of  response  elements  increase  without  limit  in  a 
latent  Markov  model.  We  shall  present  the  diffusion  limit  of  a  par- 
ticular latent  Markov  model  in  Section  3.4.   In  the  models  developed 
in  the  remainder  of  this  report  we  are  nearly  always  concerned  with  the 
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behavior  of  the  process  as  the  number  of  response  elements,  N,  goes  to 
infinity.   In  this  case,  the  latent  Markov  models  become  probability 
diffusion  models.  The  title  of  this  report  was  chosen  to  underscore 
our  primary  concern  with  the  case  where  N  — >  <=o  and  the  time  between 
change  events  goes  to  zero.   This  case  yields  the  most  appealing  models 
of  behavior,  particularly  with  respect  to  consumer  brand  choice  behavior, 
This  assertion  is  clarified  later  in  this  chapter. 

2.1.3  Marketing  Measures  from  Latent  Markov  or  Probability  Diffusion 
Models 

In  the  marketing  literature  considerable  attention  has  been  given 
to  the  concept  of  "brand  loyalty."  See  for  example,  Brown  (1952), 
Cunningham  (1956)  and  (1961),  and  Farley  (1964).   A  myriad  of  measures 
of  brand  loyalty  has  appeared.   In  particular  see  Rice's  (1962)  factor 
analytic  study  of  the  interrelationship  between  alternative  measures 
of  brand  loyalty. 

The  most  intuitively  appealing  measure  of  brand  loyalty  is  the  con- 
sumer's probability  of  purchasing  a  particular  brand  on  any  given  pur- 
chase occasion.   Such  a  probability  provides  a  continuous  measure  of 
the  consumer's  inclination  or  predisposition  to  purchase  the  brand  in 
question.   Instances  of  the  use  of  such  a  measure  may  be  found  in  Kuehn 
(1958),  Frank  (1960),  and  Morrison  (1965a  and  b) . 

The  latent  Markov  or  probability  diffusion  models  developed  in  this 
report  will  yield  estimates  of  the  distribution  of  brand  loyalty  across 
a  population  of  heterogeneous  consumers  at  some  response  occasion.   That 
is,  the  models  will  enable  us  to  obtain  estimates  of  the  distribution 
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of  response  probability  in  the  consumer  population  at  some  response  oc- 
casion. 

In  these  models,  as  is  true  for  linear  learning  models,  we  may 
also  obtain  estimates  of  the  rate  of  approach  to  equilibrium  from  any 
disequilibrium  market  situation.   In  addition  we  obtain  an  estimate  of 
the  equilibrium  choice  share  of  response  A  for  the  population.  When 
the  responses  are  brand  choices,  the  latter  is  of  obvious  interest  to 
the  marketing  manager. 

It  will  be  seen  in  Chapter  V  that  we  also  obtain  measures  which 
enable  us  to  infer  a  brand's  attractive  power  as  well  as  its  retentive 
power. 

2.2  The  General  Latent  Markov  Model 

In  this  section  we  shall  formulate  the  basic  latent  Markov  model 
in  terms  of  a  set  of  axioms.  The  axioms  shall  be  classified  into  three 
subclasses:   specification,  response,  and  Markov  process.   Immediately 
following  the  presentation  of  each  subclass  of  axioms  will  be  a  dis- 
cussion of  that  subset.  The  section  will  conclude  with  the  derivation 
of  the  system  of  differential  equations  implied  by  the  Markov  process 
axioms.   These  axioms  provide  a  summary  of  the  basic  assumptions  re- 
quired in  both  of  the  latent  Markov  models  which  follow. 

2.2.1  Specification  Axioms 

51.  All  individuals  in  the  population  behave  according  to  the 
same  continuous  time  Markov  process. 

52.  On  any  response  occasion  there  are  two  mutually  exclusive 
and  collectively  exhaustive  responses,  A  and  A, • 
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53.  Within  each  individual  there  are  N  (possibly  infinitely  many) 
hypothetical  response  elements. 

54.  At  any  given  time  t,  each  of  the  N  response  elements  within 

an  individual  is  uniquely  associated  with  either  response 

A  or  A,. 
o     1 

It  should  be  remarked  that  by  Axiom  SI  we  have  assumed  that  all 
individuals  in  the  population  are  identical  with  respect  to  the  process 
which  determines  the  dynamics  of  their  response  probabilities.   This 
does  not,  however,  say  that  individuals  have  the  same  response  proba- 
bility at  any  given  time,  t.   In  fact,  in  the  development  of  the  two 
models  we  shall  explicitly  postulate  that  at  t=0  the  individuals  in 
the  population  are  distributed  with  respect  to  their  initial  proba- 
bility of  making  an  A.  response.   Axiom  S2   indicates  that  in  the  present 
study  we  shall  confine  our  attention  to  a  one  dimensional  stochastic 
process  which  generates  a  sequence  of  0's  and  l's  as  observable  outcomes. 

2.2.2  Response  Axioms 

Rl.  If  at  time  t  an  individual  has  i  of  his  elements  associated 
with  response  A.  and  if  he  makes  a  response  at  time  t,  then 
his  probability  of  making  response  A.  at  time  t  will  be 

PfA,  at  time  t  i  elements  associated  with  A,  >  =  — 
L  1  '  1J    N 

where  N  is  the  total  number  of  response  elements. 

R2.   Pseudo-Bernoulli  Trials  Assumption.   The  history  of  the  process 
does  not  effect  the  probability  of  response  A.  at  time  t. 
Formally, 

p(A1(t)|A.(t-l),Aj(t-2),...,Ak(t-n)}  =  pfA^t)] 

for  i,  j,  k  =  0,  1  and  for  all  n,  t. 

R3.   Individuals  in  the  population  respond  independently  of  one 
another. 
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The  fundamental  role  of  the  hypothetical  response  elements  is  indi- 
cated by  Axiom  Rl.   This  axiom  states  that  an  individual's  probability 
of  making  response  A.  at  any  time  t  is  just  equal  to  the  proportion  of 
his  response  elements  which  are  associated  with  response  A  at  time  t. 
Axiom  R2   is  denoted  as  a  pseudo-Bernoulli  trials  assumption  because 
it  captures  the  essence  of  Bernoulli  trials  in  that  there  are  two  mutually 
exclusive  and  collectively  exhaustive  response  possibilities  and  the 
probability  of  the  occurrence  of  these  alternatives  is  independent  of 
the  history  of  the  process.  It  is  at  this  point  that  the  current  formu- 
lation stands  in  direct  contrast  to  learning  models,  which  postulate  that 
the  particular  sampling  function  observed  will  have  an  influence  upon 
the  future  probability  of  the  outcomes.  The  adjective  "pseudo"  is  used 
in  recognition  of  the  fact  that  "true"  Bernoulli  trials  require  that 
the  process  generating  the  trials  remains  stationary.  As  will  become 
evident  from  the  Markov  process  axioms  discussed  below,  the  response 
probability  for  any  given  individual  is  subject  to  change  over  time. 
Hence  the  stationarity  assumption  of  "true"  Bernoulli  trials  is  violated. 

Axiom  R3  will  prove  to  be  of  considerable  importance  when  we  con- 
sider problems  of  aggregation  and  estimation.  Perhaps  it  would  be  well 
to  state  at  this  point  that  in  applications  of  these  latent  Markov  models 
to  consumer  panel  data  this  axiom  is  fulfilled  by  the  nature  of  the 
data  gathering  process.   Panel  households  are  sufficiently  dispersed 
geographically  to  assure  us  of  nearly  zero  interaction. 

In  summary,  Axiom  R2  assumes  that  for  any  given  individual 
responses  will  be  independent  over  time.   Axiom  R3  assures  us  that  the 


-42- 

responses  are  independent  cross-sectionally  between  individuals.   Hence 
all  responses  of  all  individuals  are  postulated  by  the  model  to  be  inde- 
pendent. 

Before  we  specify  the  Markov  process  axioms,  it  is  necessary  to  in- 
dicate that  in  the  models  being  considered,  as  in  the  case  of  linear 
learning  models,  the  state  space  will  be  the  individual's  response  proba- 
bility.  Since  each  individual  in  the  population  is  assumed  to  have  the 
same  number  of  response  elements,  N,  an  equivalent  state  space  would  be 
the  number  of  response  elements  associated  with  response  A1 ■  We  shall 
denote  the  states  of  this  latter  state  space  by  i.   In  addition,  we  shall 
use  the  symbol  0(^t)  to  denote  terms  of  an  order  which  tend  to  zero 
faster  than  At.   Formally  this  symbol  denotes  terms  for  which 

lim   °<At>  =  0  . 
At-,0  At 

2.2.3  Markov  Process  Axioms 

Ml.   If  at  time  t  an  individual  is  in  state  i(i=l,2, . . . ,N) ,  the 

probability  of  the  transition  i  ->  i+1  in  the  interval  (t,t+^t) 
is  A.  At  +  0(A  t). 

M2.   If  at  any  time  t  an  individual  is  in  state  i(i=l,2, . . . ,N) ,  the 
probability  of  the  transition  i  — >   i-1  in  the  interval 
(t,  t+At)  is  |i  dt  +0(At). 

M3.   The  probability  of  a  transition  to  other  than  a  neighboring 

state  is  O(At).   Formally,  the  probability  of  the  transition 
i  -^  j  for  |i-j|  >  1  is  0(£t).   Hence 

P{li-j|  >  0] 
lim   — =  0. 

At->0 

M4.   The  process  is  stationary.   That  is  A,  and  u  are  independent 
of  time. 
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An  immediate  consequence  of  Ml  to  M3  is  that  the  probability  that 

an  individual  will  remain  in  state  i  during  the  interval  (t,  t+At)  is 

1-(A ,+u ,)<A  t  +  0(A  t) .   Note  that  we  have  simply  postulated  a  birth  and 

death  process  on  the  response  elements  in  terms  of  their  association 

with  response  A,  or  A  „   The  reader  familiar  with  birth  and  death  pro- 
1     o 

cesses  in  probability  theory  will  probably  only  want  to  skim  Section  2.2.4. 

The  above  system  of  axioms  specifies  a  latent  Markov  process  which 
operates  in  continuous  time  upon  the  response  elements  within  any  indi- 
vidual.  It  is  Markovian  with  respect  to  the  state  space  of  response 
probabilities.   The  Markov  process  is  latent  in  that  it  operates  on  the 
unobservable  state  space  of  response  probabilities. 

2.2.4  System  of  Differential  Equations 

From  the  Markov  process  axioms  M1-M5  we  are  able  to  develop  a  system 
of  differential  equations  on  the  probability  that  an  individual  will  oc- 
cupy a  given  state  at  time  t.   First  consider  the  state  probability 

p  (t).  the  probability  that  at  time  t  none  of  the  response  elements  are 
o 

associated  with  response  A.  •   For  a  small  increment  of  time,  A  t,  we 
would  like  to  know  the  probability  that  this  individual  will  be  in  state 
0.   Since  At  is  a  very  small  increment  of  time,  our  axioms  indicate 
that  during  such  an  interval  at  most  one  of  the  total  of  N  response  ele- 
ments could  have  shifted  its  association  either  from  response  A   to  A, 

o     1 

or  from  A,  to  A  .   Hence  he  could  only  be  in  state  0  at  time  t+£t  if  one 
1     o 

of  two  mutually  exclusive  and  collectively  exhaustive  events  has  occurred. 
Either  he  was  in  state  0  at  time  t  and  none  of  his  response  elements 
has  shifted  its  association  by  time  t+£t  or  he  was  in  state  1  at  time 
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t  and  the  one  element  that  was  associated  with  response  A1  has  shifted 

to  an  association  with  response  An  by  time  t+£t.   The  probability  that 

he  was  in  state  0  and  no  change  occurred  is  p  (t)  1-A  4  t-0  ( /s  t)  ~J  where 

o   l_   o 

the  first  factor  is  the  probability  that  he  was  in  state  0  at  time  t 
and  the  second  factor  is  the  probability  of  no  change  in  his  state 
during  At  if  he  was.  Similarly,  the  probability  that  he  was  in  state 
1  at  time  t  and  shifts  to  state  0  by  time  t+At   is  just  p.  (t)j~n £t+0(4  t)J 
Since  the  two  paths  to  arrive  at  state  0  by  time  t+At  are  mutually  ex- 
clusive and  collectively  exhaustive,  it  is  clear  that  the  total  proba- 
bility that  this  individual  respondent  is  in  state  0  at  time  t  is  given 
by  the  sum  of  the  probability  of  each  of  these  two  paths 

2.2-1  p  (t+At)  =  p  (t)fl-A  At]   +  P,(t)u  A  t+0(At) 

O  O    U         o   -*  1     1 

where  we  have  collected  all  terms  of  0(A  t)  in  0(4t), 

Subtracting  p  (t),  dividing  through  by  At,   and  taking  the  limit 
as  A  t  goes  to  zero  yields 

2.2-2  ,.    P0(t+*t)-po(t) 

"»     Ti =  "Vo(t)^ipi(t) 

£  t->U 
where  we  have  used  the  fact  that    lim   ^  *"'■  =  0 

But  we  note  that  2.2-2  is  just  the  definition  of  the  derivative  of  the 

function  p  (t) .   Hence  we  have  the  following  differential  equation  for 

the  probability  of  state  0: 

dp  (t) 
2.2-3  _o__  =  -^(0^(0 
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By  similar  reasoning  we  may  complete  the  system  of  equations  for  all 
N+l  of  the  possible  states  of  the  individual  respondent.  These  equations 
are 

dp,(t) 

2.2-4         — J— -  =   -(V-hi.)P.(t)+A.      p      1(t)-hi.J.1P,J.,(t)  for  0  <   i  <  N 

dt  ixi  l-l   i-1  l+l   i+1 

2>2.5         JgO.  .  .^(t)  +  VlPN.l(t) 

In  the  models  developed  in  the  subsequent  two  sections  of  this  chapter 
we  shall  further  specify  A.  ana  u.«  Then  we  shall  examine  the  steady- 
state  distribution  of  the  response  states  or,  equivalently,  the  steady 
state  distribution  of  response  probability. 

To  the  reader  versed  in  probability  theory  the  above  development  is 
clearly  a  special  case  of  the  general  birth-death  process.   For  example, 
see  Feller  (1957).   The  only  difference  lies  in  the  fact  that  there  is 
an  upper  bound  on  the  "size  of  the  population."  That  is,  no  more  than 
N  response  elements  may  be  associated  with  response  A.  where  we  now  con- 
sider that  a  response  element  which  shifts  its  association  to  response 

A  is  "born"  and  an  element  which  shifts  its  association  to  response  A 
1  o 

has  "died."   If  we  should  let  the  number  of  response  elements  increase 
without  limit,  then  the  above  system  of  differential  equations  holds  for 
the  general  birth-death  process  in  which  there  is  no  upper  bound  to  the 
ultimate  size  of  the  population. 
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2 .3   Independent  Elements  Model 

2.3.1  Formulation 

We  now  turn  our  attention  to  the  development  of  a  special  case  of 
the  general  model.   This  model  type  has  been  used  by  Coleman  in  studies 
of  consumer  behavior  (Coleman,  1963)  and  in  studies  of  voting  behavior 
in  small  union  groups  (Coleman  1964b,  Chpt.  11). 

In  the  Independent  Elements  Model  as  well  as  in  the  remainder  of 
this  report,  we  shall  denote  the  response  probability  of  an  individual 
at  time  t  by 

X(t)  =  i/N, 

where  i  is  the  number  of  the  N  response  elements  associated  with  response 

V 

Before  we  may  consider  the  form  of  the  steady-state  distribution 

of  response  probability,  X(t)  =  -  ,   we  must  first  further  specify  the 

properties  of  the  latent  Markov  model.   In  particular,  we  shall  assume 

that: 

(i)   the  N  response  elements  within  an  individual  behave  inde- 
pendently, 

(ii)   each  element  associated  with  response  A_  has  transition  in- 
tensity a   toward  becoming  associated  with  response  A- . 

(iii)   each  element  associated  with  response  A1  has  transition  in- 
tensity 0  toward  becoming  associated  with  response  An« 

These  additional  assumptions  enable  us  to  specify  the  A.  an<*  u.  of  the 

general  model.   In  particular, 


2.3-1  Xj  =  (N-i)a 
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and 


2.3-2  n£  =  ip 

Equation  2.3-1  represents  the  fact  that  for  h  an  interval  of  time  so 
small  that  at  most  one  element  may  change  its  state  there  are  N-i 
elements  associated  with  response  An  which  could  shift  their  associa- 
tion to  response  A  •   Hence  in  order  for  the  state  of  an  individual  to 
change  from  i  to  i+1,  one  of  the  N-i  elements  associated  with  response 
a  would  have  to  change  its  association.   This  can  happen  in  N-i  inde- 
pendent ways  during  the  interval  h.  Similarly,  one  may  develop  equa- 
tion 2.3-2. 

Substituting  2.3-1  and  2.3-2  into  2.2-3,  2.2-4,  and  2.2-5,  we  have 

the  following  system  of  differential  equations  for  the  Independent  Ele- 

2 
ments  Model: 

dp  (t) 

2.3-3        °    =  "Nop  (t)-H3p  (t) 
dt         o      1 

dp  (t) 

*t   =  -|_(N-i)a  +  ipJp.(t)+(N-i+l)a  P^C*)  +  (i+1)P  Pi+1<t> 


for  0  <  i  <  N 


dp  (t) 

IT"  =  "Np  PN(t)  +  a  Vl(t) 


The  steady-state  distribution  of  i  is  given  by 

m>         i        N_i 

2-3"4       pi  -TTWvT    W  U'W 
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which  is  simply  the  binomial  distribution.  We  have  dropped  the  index, 

t,  since  we  are  considering  the  distribution  in  the  steady-state.   The 

dp.(t) 
solution  2.3-4  may  be  obtained  by  setting  — — =  0  for  i  =  0,  1,...,  N 

or  by  the  method  of  generating  functions  given  in  Feller  (1957).   Now. we 

have  the  well  known  results  for  the  binomial  mean  and  variance 


2.3-5  Hi]  =  N  <^> 


var[i]  .  »  (A,  (1  .  JjL, 


Recall  that  we  express  an  individual's  response  probability  as  X(t)  =  —  . 

N 

Then  for  X(t)  in  the  steady-state  we  have 


2.3-7 


«  ■  4fi  ■  & 


2.3-8  Var[a]  .  var[±]  .  -jj  vat[i]  =  i 


(-SL)  (i  .  -SL) 


It  remains  to  examine  the  behavior  of  this  Independent  Elements 
Model  as  a  function  of  N,  the  number  of  response  elements  within  an 
individual.   There  are  two  extreme  cases  of  interest;   N  very  small,  N 
very  large. 
Case  1.   N  very  small. 

In  this  case  it  is  clear  that  an  individual's  response  probability 
may  only  take  on  a  few,  widely  spaced  values.  If  N=l,  he  may  only  have 
the  response  probability  values  x(t)  =  0  or  x(t)  =  1.   If  N=2,  he  may 
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only  have  the  values  x(t)  =  0,  x(t)  =  -  ,  and  x(t)  =  1.   In  fact,  his 
response  probability  may  only  take  on  N+l  distinct  values  if  there  are 
N  elements.   It  is  more  intuitively  appealing  to  have  a  model  which  al- 
lows a  response  probability  continuum.   Hence  we  shall  now  turn  to  the 
case  where  the  number  of  response  elements  increases  without  limit. 
Case  2.   N  increases  without  limit. 

Clearly,  in  the  limit,  the  values  an  individual's  response  proba- 
bility may  take  on  approaches  a  continuum.  This,  of  course,  is  a  more 
appealing  situation  than  that  of  Case  1.  But  the  Independent  Elements 
Model  breaks  down  in  another  sense  in  this  case.  If  we  examine  2.3-8, 
we  see  that  the  variance  of  response  probability  in  the  steady-state 
is  inversely  related  to  the  number  of  response  elements.  Thus,  if  we 
let  N  increase  without  limit,  we  have 

2.3-9         lim  Var[xJ  =  lim  ±  (JjL)  (1  -  JjL)  =  0 

This  tells  us  that  as  N  — >  ao   ,  an  individual  described  by  the  Independent 
Elements  Model  will  have  a  response  probability  in  the  steady-state 
exactly  equal  to  a/(a+p).   Since  we  are  assuming  that  the  same  process 
in  terms  of  a  and  {$  holds  for  all  the  independent  members  of  the  popu- 
lation, we  see  that  the  Independent  Elements  Model  implies  that  the 
entire  population  converges  to  the  response  probability  a/(a+6)  in  the 
steady-state  regardless  of  their  starting  state.   Hence  the  model  implies 
homogeneity  of  response  probability  in  the  steady-state.   Such  a  result 
is  no  more  appealing  than  the  assumption  that  all  individuals  start  out 
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with  the  same  response  probability,  as  has  generally  been  the  case  in 

3 
applications  of  the  linear  learning  models. 

Although  the  above  comments  indicate  why  the  Independent  Elements 
Model  is  inappropriate  as  a  model  of  consumer  choice  behavior,  there 
are  instances  in  the  social  sciences  where  the  model  may  prove  useful. 
For  example,  Coleman  (1964  b,  pp.  336-343)  has  applied  it  to  voting  be- 
havior in  small  groups.   Our  elements  correspond  to  his  individuals  and 
his  groups  correspond  to  our  individuals.   The  above  comments  would 
certainly  caution  against  using  this  model  except  for  small  group  analy- 
sis and  only  then  if  one  can  assume  that  individuals  make  their  choices 
independently. 

Fortunately  for  the  utility  of  the  latent  Markov  models,  it  is 
possible  to  rectify  the  problems  noted  above  by  postulating  a  process 
in  which  there  are  cohesive  forces  between  the  elements.  Such  a  model 
is  studied  in  detail  in  Chapter  III. 

Although  we  already  know  that  the  Independent  Elements  Model  is 
unsatisfactory,  it  nevertheless  would  prove  instructive  to  examine  the 
evolution  of  an  individual's  response  probability  through  time  under 
this  model. 

2.3.2  Mean  Value  Function  of  the  Independent  Elements  Model 

The  mean  value  function  of  a  stochastic  process  is  defined  as 
m(t)  =  E£x(t)J  where  x(t)  is  the  state  variable  at  time  t.   In  the  models 
which  we  are  considering  x(t)  is  the  probability  of  making  response  A. 
at  response  occasion  t.   Recall  that  in  terms  of  the  model  x(t)  is  just 
the  proportion  of  the  response  elements  which  are  associated  with 
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4 
response  A  at  response  occasion  t. 

The  development  of  m(t)  requires  that  we  first  examine  the  expected 

change  in  response  probability  during  the  interval  (t,  t-fh)  where  h  is 

an  increment  of  time  so  small  that  at  most  one  element  could  have  changed 

its  association  during  h.   Formally,  we  seek  E[x(t+h)-x(t)J  .   Now  it  is 

known  from  probability  theory  that 


2.3-10  EyH=Ex[Ey|x[Wj] 


where  the  subscripts  denote  the  random  variable  with  respect  to  which 
the  expectation  is  being  taken.   In  terms  of  our  present  process  we  have 

2.3-11         E[x(t4h)  -  x(t)]  =  Ex[E(x(t+h)  -  x(t)(x(t)  =  1/nJ] 

We  have  sufficient  information  to  compute  E£x(t+h)-x(t) |x(t)  =  i/N^J . 

For  h  sufficiently  small  so  that  the  system  may  change  by  at  most  one 

unit  during  h  there  are  three  possible  values  of  x(t+h)-x(t)  =  £x(t). 

These  values  and  their  associated  probabilities  are  given  in  Table  2-1. 

TABLE  2-1 
Ax(t)   CONDITIONAL  ON  x(t) 
INDEPENDENT  ELEMENTS  MODEL 

4x(t)  Pfex(t)|x(t)  =  i/Nj 

1/N  ^h  =  (N-i)a  h 

0  1  -  (Xt+  ut)  h 

-  1/N  u  h  =  i  p  h 
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From  Table  2-1  we  have 

2.3-12     E[ox(t)|x(t)  =  i/Nj  =  jj  {(N-i)a  -  i  p} 

=  h-[[l-x(t)]a  -  P  x(t)^- 
=  h{a  -  (a+P)  x(t>3 

Substituting  2.3-12  into  2.3-11  gives  us 

2.3-13      E  £x(t+h)-x(t)J  =  h£a  -  (cr+fl)  E[x(t)]j- 

Dividing  through  by  h  and   taking   the  limit  as  h  — >  0  we  find 

2.3-14  If  Efot-tfQ-x^tfl   m     llB|Efi<t-rtijJ-Elx<tq 

h->0  h  h-»0  h 

-  d  'g^fl   -  a  -   (a^)E[x(t)] 

The  first  equality  follows  from  the  fact  that  expectation  is  a  linear 
operator  while  the  second  equality  follows  by  the  definition  of  the 
derivative  of  a  function. 

The  solution  to  2.3-14,  as  may  be  verified  by  substitution,  is 

2.3-15      E[x(t)]  =  E[x(0)J  e-to^'  +  £  [l-e"^  fc]  . 

2.3.3  Variance  of  the  Independent  Elements  Model 

It  remains  to  consider  the  variance  of  the  process.  Since 

2.  .-7    2, 


Var[x(t)]  =  e[x  (t)J  -  E  [x(t)J 
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and  since  we  have  EJx(t)J  from  2.3-15,  we  now  seek  to  develop  e\x  (t)J  . 

n 

To  do  so  we  first  examine  Ejx  (t+h)J  .   It  is  easily  seen  that 

2.3-16    e[x  (t+h)]  =  e[x  (t)J  +  EJfx(t-fh)-x(t)}2J  +  2EUlt(t+h)-x(t)J  x(t)J 


First  consider 


Er(x(t-+h)-x(t)}2  |  x(t)J  . 


The  possible  values  of  [Ax(t)j  and  their  associated  probabilities 
are  given  in  Table  2-2. 


TABLE  2-2 
[A  x(t)j   CONDITIONAL  ON  x(t) 
INDEPENDENT  ELEMENTS  MODEL 

{AMt)}2  P({^x(t)}2|x(t)  =  i/NJ 

2 

1/N  Alh  =  (N-i)a  h 

0  1-  (\+  ^  h 

1/N  ^h  =  i  p  h 


From  Table  2-2  we  have  that 

2 


E[{x(t+h)-x(t)}  |x(t)J  .  h  N*  {(N-i)a  +  i  p} 

=  h  H*1[[l-x(t)]  a  +  p  xCt)^ 
=  h  n"  (a  -  (a-p)x(t)} 
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Taking  the  expectation  with  respect  to  the  distribution  of  x(t)  then  yields 

E[{x(t4h)-x(t)}2]  =  h  N"1{a-(a-p)E[x(t)]} 

It  remains  to  consider 

E  r{x(t-th)-x(t)>x(tj]. 

The  possible  values  of  [ix(t)jx(t)  and  their  associated  probabilities  are 
given  in  Table  2-3. 


TABLE  2-3 
x(t)(Ax(t)j   CONDITIONAL  ON  x(t) 
INDEPENDENT  ELEMENTS  MODEL 

x(t)(Ax(t)}  p[x(t){^x(tj}l  x(t)=i/Nj 

x(tVN  Ajh  =  (N-i)  a  h 

0  1  -  CXt+  ut)  h 

-x(t)/N  u^  =  i  B  h 


Using  the  results  given  in  Table  2-3  we  find 

E[{x(t-rii)-x(t)}x(t)jx(t)7  =  hx(t)N_1{(N-i)a  -  i  p} 

=  h  x(t){[l-x(t)J  a-p  x(t)"} 
2        2   ~> 

=  h{ax(t)-a  x  (t)-p  x  (t)j 
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Taktng  the  expectation  with  respect  to  the  distribution  of  x(t),  we 
find 

E  [fx(t4h)-x(t)jx(t)J  =  hfa  E[x(t)3  "  (a+P)E[x2(t)]| 

We  now  may  write  2.3-16  as 

e[x  (t+h)j  =  e[x  (t)]  +  h  n"  {a-(a-p)E[x(t)]}+  2h  {a  E[x(t)] 

-  (a-H$)E[x2(t)J  j 

Subtracting  e[x  (t)J  ,   dividing  by  h,  and  taking  the  limit  as  h-»0  we 
find 

0  0  0 

2.3-17       lim  E[x  (t+h)]-E[x  (t)l  =   d  Epc  (t)l 
h-»0         h  dt 

=  ^{a-(a-p)E[x(t)]}  +  2a  E[x(t)]-  2(a-H3)E[x2(t)] 

In  order  to  show  the  degeneracy  of  this  model  as  N  -*  °° ,   we  now  con- 

o 

d  eTx  (tVl  "        " 

sider  lim  r — '  '      •   Let  a over  an  expectation  indicate  that 

dt 

the   limit  as  N->oo    has  been  taken. 
We  now  have 

2.3_18         lim    0^i££  .  d  EfrVfl    =  2  Q  g(-x<t)-|    .  2(afj))i[x2(t|| 

N-*c© 

-i-2   -i 

In  order  to  find  Var  x(t)  we  need  to  solve  2.3-18  for  E[_x  (t)J. 

We  shall  solve  2.3-18  by  the  method,  of  undetermined  coefficients. 

Assume  a  solution  of  the  form 
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2.3-19  i[*2(t):  .  a0  +  ^  .-<*»'  +  ^  .-*<*»« 

Differentiating  2.3-19  we  obtain 

2.3_20  d  iGLiaJ   .   -CB4ff).i   e-to+P)t.2(a^)a2   .-*<**>  t 

Now  substitute   2.3-15  and   2.3-19   into  2.3-18   noting   that   E|x(tXJ    =  ijx(t)] 
as   is   obvious    from  2.3-15.      This   yields 

2.3_21  d  £&'(t)3  .  2a{E[x(0)]  e-<^)t  +  JjL  (!-.-<**>*)} 

-2(c^)f«0*«1«-<ort*)t  *«2  e'2^'] 

-2(a+P)a2    . 

Denote  the  initial  condition  for  2.3-19  as 
2.3-22       i[x  (0)]  =  aQ  +  ax  +  a2  . 

Equating  coefficients  of  similar  terms  in  2.3-20  and  2.3-21  yields  two 
additional  equations  on  the  undetermined  coefficients. 

2.3-23       0=^2--  2(a+fi)  a 

a+P  0 

2  2 

-(Of**)  a.  =  2a  Efx(0)]  -  g_  .  2(a40)  8, 

1  OC+p  i 
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Solving  2.3-22  and  2.3-23  for  aQ,  a^  and  a^   we  have 


2 

2.3-24       an   = 


0  "  (a^)2 


'.-2*{iW^} 


a2  =  E[x2(0)]  -  aQ  -  aj 

=  E[x2(0)]  +-^t-2^-eLx(0)] 
(a-ff)     °™ 


Hence  we  have  for  e[x  (t)^  that 


2.3-25         i[„2(t,]  .  <^>    ♦  i«^){l&m]  -  ^Kfa*' 


e-2(a-^)t 


Using  the  results  of  2.3-25  and  2.3-15  we  now  may  compute,  with  a 
bit  of  algebra,  the  variance  of  the  process  when  the  response  proba- 
bility is  a  continuous  random  variable.  The  result  is 

2.3-26         Vte(%(t)J  =  {l[x2(0)]  -  E2[x(0)]}e"2(a^),: 

It  might  be  noted  that 

Vm[x<0)]   =  E[x2(0)J   -  E2[x(0)] 
and   thus  Var[x(t)J    =  Var [x(0)Je"2(a^)t 
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From  2.3-26  we  see  that  the  variance  tends  to  zero  as  the  process  ap- 
proaches the  steady-state.   In  fact,  the  only  variability  in  the  pro- 
cess is  due  to  the  variance  in  the  initial  response  probability.  We 
have  already  noted  that  this  is  very  unappealing  in  terras  of  our  in- 
tuitive notions  concerning  the  process. 

As  a  final  comment  on  the  Independent  Elements  Model,  suppose  we 

consider  an  individual  with  a  given  initial  response  probability.  Then 

2       2 

for  this  individual  Efx(O)]  =  x(0)  and  e[x  (0)J  =  x  (0)  and  hence 

Var[x(0)J  =  x  (0)-x  (0)  =  0.   Thus  he  follows  2.3-15  deterministically 
until  his  response  probability  equals  a/(a+B)  ,  where  it  remains  for- 
evermore  as  far  as  the  model  is  concerned. 

In  the  next  chapter  we  shall  see  that  the  "degeneracy"  exhibited 
by  the  Independent  Elements  Model  may  be  rectified  by  an  assumption  of 
dependence  between  the  response  elements.   In  fact,  in  view  of  our 
pseudo-Bernoulli  assumption,  the  Cohesive  Elements  Model  yields  a 
particularly  satisfying  steady-state  distribution  when  the  number  of 
elements  increases  without  limit. 

2.4  Summary 

In  this  chapter  we  first  considered  some  general  notions  and 
measures  which  arise  in  latent  Markov  models.   In  addition,  it  was  ex- 
plained that  this  report  is  titled  A  Probability  Diffusion  Model  of 
Dynamic  Market  Behavior  in  order  to  underscore  our  primary  concern  with 
the  infinite  element  form  as  a  model  of  consumer  brand  choice.  An 
axiomatic  foundation  was  then  presented  for  the  General  Latent  Markov 
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Model.   The  Independent  Elements  Model  was  formulated,  analyzed,  and 
found  unsatisfactory  as  a  model  of  consumer  brand  choice. 

Footnotes 

1.  See  Haines  (1964). 

2.  It  should  be  noted  that  this  system  of  differential  equations  is 
identical  to  the  ones  arrived  at  by  Feller  (1957,  pp.  420-421)  in 
the  power  supply  problem. 

3.  Massy  (1965)  has  presented  methods  for  estimating  the  linear  learning 
model  in  the  presence  of  heterogeneity  of  the  initial  response 
probabilities. 

4.  See  Parzen  (1962,  Chpt.  3)  for  a  more  complete  discussion  of  mean 
value  functions. 

5.  See  Parzen  (1960,  pp.  384). 


Chapter  III 
THE  COHESIVE  ELEMENTS  MODEL 

In  the  previous  chapter  the  Independent  Elements  Model  was  shown 
to  degenerate  when  the  number  of  response  elements  increases  without 
limit.   The  degeneracy  is  in  the  sense  that  all  individuals  in  the  pop- 
ulation tend  deterministically  to  the  same  response  probability.   In 
the  present  chapter  this  situation  is  rectified  by  the  development  of 
a  birth-death  process  on  the  response  elements  in  which  the  elements 
are  assumed  to  have  a  cohesive  property.  Thus  the  present  model  drops 
the  previous  assumption  that  the  response  elements  within  an  individual 
behave  independently. 

In  the  present  chapter  we  first  develop  the  Cohesive  Elements  Model 
and  its  associated  steady-state  distribution  of  response  probability 
for  a  finite  number  of  elements.   If  the  number  of  elements  is  then 
allowed  to  increase  without  limit,  the  steady-state  distribution  of 
response  probability  is  shown  to  go  to  a  beta  distribution  whose  two 
parameters  are  functions  of  the  three  parameters  postulated  in  the 
birth-death  process  on  the  response  elements.  We  next  consider  the 
mean  and  variance  of  the  stochastic  process  which  takes  the  response 
probabilities  forward  in  time.   It  is  shown  that  the  present  model  does 
not  degenerate  in  the  sense  of  the  Independent  Elements  Model.  We  then 
develop  the  diffusion  limit  of  the  model,  which  shows  the  connection 
of  the  Cohesive  Elements  Model  to  our  discussion  of  probability  dif- 
fusion models  in  Chapter  II.   Aggregation  and  the  development  of  po- 
tential estimating  equations  are  the  next  topics  considered.   Finally, 
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we  examine  methods  of  obtaining  information  on  the  cross-sectional  distri- 
bution of  response  probability  at  each  response  occasion.   The  general 
problem  of  estimating  the  Cohesive  Elements  Model  will  be  reserved  for 
Chapter  IV. 

3.1   Formulation  and  Steady-State  Distribution 

Once  again,  before  we  may  consider  the  behavior  of  the  model  in  the 

steady-state,  we  must  further  specify  the  transition  intensities  of  the 

latent  Markov  process.   For  the  present  model  we  shall  assume  that; 

(i)   each  element  associated  with  response  An  has  transition  in- 
tensity a  toward  becoming  associated  with  response  A.. , 

(ii)   each  element  associated  with  response  A.,  has  transition  in- 
tensity (3  toward  becoming  associated  with  response  An> 

(iii)   the  transition  intensity  of  each  element  is  increased  by  an 

amount  Y   f°r  each  element  associated  with  the  opposite  response. 

Note  that  (i)  and  (ii)  are  identical  to  (ii)  and  (iii)  in  Section  2.3, 

while  (iii)  replaces  the  independent  elements  assumption  (i)  of  Section 

2.3. 

Assumption  (iii)  might  be  thought  of  as  an  assumption  of  cohesion 
or  attraction  between  the  response  elements. 

Using  (i)-(iii)  above  and  the  axioms  of  Section  2.2  we  may  examine 

the  process  both  at  the  level  of  the  response  elements  and  at  the  level 

1  2 

of  the  individual  respondent.    In  diagrammatic   form  the  process  at  the 

level  of  the  response  elements  is; 
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where  i  =  number  of  elements  associated  with  response  A, 
N  =  total  number  of  response  elements. 
Since  each  individual  respondent  is  assumed  to  have  N  response  ele- 
ments, the  two  state  process  at  the  level  of  the  elements  induces  a 
N+l  state  model  at  the  level  of  the  individual.  The  transition  inten- 
sities at  the  level  of  the  individual  respondent  are,  in  the  notation 
of  the  general  latent  Markov  model, 


3.1-1    A,  = 


/Single  element  transition  intensity^ /Number  of  response  \ 
\from  Aq  to  A  /^elements  in  state  A0] 


=  (o+  l7)(N-i) 


Similarly, 


3.1-2     Ui  =  <jp+(N-i)7]i 


In  diagram  form  we  have  at  the  level  of  the  individual 
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where  the  state  space  is  the  number  of  elements,  i,  out  of  the  total 
of  N  elements  which  are  associated  with  response  A  . 

If  we  substitute  3.1-1  and  3.1-2  into  the  general  latent  Markov 
process  equations  2.2-3,  2.2-4,  and  2.2-5,  we  have 

d  P  (t) 
3.1-3      —£ =  -N  a  PQ(t)  +  [p  +  (N-l)7)p1(t) 

— ji =  -  |(N-i)(a+  17)+  i[p+(N-i)7]j  P^t) 

+  (N-i+Dfa+Ci-DrJp^jCt) 

+  (i+l)[p+(N-i-l)f|pi+1(t)       for  0^i<N 

d  p  (t) 

— fj-  =  -  nP  PN(t)+[a+(N-l)f)pN_1(t) 


The  steady-state  solution  of  the  system  of  differential  equations, 

given  as  Eqn.  3.1-3  may  be  obtained  by  the  simultaneous  solution  of 

d  P  (t) 
Eqn.  3.1-3  when  — — =  0  for  i=0,  l,...j  N.  The  steady-state 

derivation  also  makes  use  of  the  fact  that 


N 

I    Pt(t)  =  1. 

i=0  L 

Letting 

a  =  and  c  =  — *— 

orfp        a+p 

3 
Coleman  reports  the  solution  as 
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i-1       N-i-1 

~rr  <a+  jc)ir  a-a+  j«o 

AD— 


3.1-4  j=0 


N-l 

TT  (i+jc) 

j=0 


where  we  have  dropped  t  from  p.(t)  since  we  are  now  considering  the 

4 
steady-state  distribution.   Coleman  also  gives  a  useful  reparame- 

terization  of  3.1-4  with  a  =  Xc     as 


3.1-5 


',-(:) 


ra1)f(N+;)r(;-i) 


where  i    I  =  — ; • r  and  I  (x)  represents  the  gamma  function  of  the 

\    i  /   i!(N-i)I 

argument,  x.  Thus  the  steady-state  values  of  the  state  probabilities 
are  given  by  3.1-4  or  3.1-5.   Coleman  terms  this  distribution  the 
"contagious  binomial."  In  3.2  we  shall  examine  the  form  of  the  steady- 
state  distribution  when  the  number  of  elements  increases  without  limit. 
Coleman  has  unsuccessfully  attempted  to  take  this  limit  of  3.1-5  as 
N  -^  °°  .   The  logic  behind  such  a  limit  process  is  that  when  N  — >  °°    , 
the  individual  respondent  then  may  be  represented  by  what  is  essentially 
a  continuous  state-space  of  response  probability.  When  N  is  finite, 
the  response  probability  X(t)  =  i/N  clearly  takes  on  only  a  finite 
number  of  discrete  values.   Hence  letting  N  — >  °^>    yields  a  more  satis- 
fying model  of  the  individual  respondent. 
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3.2  Limiting  Form  as  the  Number  of  Response  Elements  Goes  to  Infinity 

As  we  noted  at  the  end  of  the  previous  section,  we  are  particularly 
interested  in  the  behavior  of  the  Cohesive  Elements  Model  as  the  number 
of  response  elements  within  an  individual  respondent  increases  without 
limit.   In  particular,  in  this  section  we  shall  develop  the  steady-state 
distribution  of  response  probability  as  N  goes  to  infinity. 

In  order  to  determine  the  function  which  will  yield  a  continuous 
probability  density  function  for  X  in  the  limit,  we  first  discuss  what 
occurs  as  N  ->  oo   in  such  a  way  that  X  =  i/N  remains  constant.  The 
latter  restriction  is  included  to  ensure  that  the  cumulative  probability, 

P  (Jx^  c3  where  C  is  some  constant  between  0  and  1,  remains  constant  as 

5 
N  — +oo  . 

Suppose  on  the  interval  (0,1)  that  we  have  N+l  discrete  points  at 

which  a  probability  mass,  p(x  =  i/N),  may  occur.  See  Figure  3-1. 

FIGURE  3-1 
A  PROBABILITY  MASS  FUNCTION 
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->P(X  =  2/N) 
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(N-l)/N     1 
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An  alternative  representation  would  be  to  express  the  probability  mass 
function  p(x)  as  a  histogram  in  which  h(x)  denotes  the  height  of  the 
histogram  at  any  point  X.   See  Figure  3-2  for  such  a  histogram.   Note 
that  the  width  of  each  interval  is  1/N  and  the  area  of  the  rectangle 
about  x  is  the  probability  that  the  individual  respondent  has  the  pro- 
portion x  =  i/N  of  his  N  response  elements  associated  with  response  1, 
which  we  have  denoted  by  A. • 

FIGURE  3-2 
A  PROBABILITY  HISTOGRAM 


MX) 


h(2/N) 


Area  =  P(2/N)  =  ±  h(2/N) 


0   1/N  2/N 


(N-l)/N   1 


We  have  noted  that  the  area  of  the  histogram  about  any  point  X  =  i/N 
represents  the  probability  that  X  is  the  proportion  of  the  N  response 
elements  associated  with  response  1.   That  is, 

P(x)  =  -  h(x). 

N 
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Hence  we  may  write 

h(x)  =  N  p(x). 

Suppose  now  that  in  the  interval  (0,1)  we  pack  in  more  and  more 
elements,  that  is,  we  let  N  get  large.  In  this  case  1/N  will  become 
very  small,  and  in  the  limit  the  h(x)  will  be  so  close  together  that 
they  will  trace  out  a  continuous  curve.  This  continuous  curve  is  the 
continuous  p.d.f .  of  X  =  i/N  when  N  ->  °°  in  such  a  way  that  X  =  i/N 
remains  constant.  That  is,  where  C  indicates  that  X  =  i/N  remains 
constant, 

3.2-1        lim  h(x)  =  lim  N  p(x)  =  f (X) 
X=(i/N)=C   X=(i/N)=C 

Eqn.  3.2-1  gives  us  the  function  which  we  must  take  to  the  limit  in 
order  to  obtain  the  probability  density  function  of  X,  f(X). 

For  any  fixed  N,  there  is  a  direct  equivalence  between  p(x)  in 
Eqn.  3.2-1  and  the  p  given  by  Eqn.  3.1-5.  Clearly,  for  some  fixed 
N,  the  event  X  =  i/N  occurs  if,  and  only  if,  the  event  i  occurs.  Hence 
for  fixed  N  the  events  X  =  i/N  and  i  are  equivalent  and  consequently 
they  have  the  same  probability  of  occurrence.  Thus  p(x)  =  p  .  But 
we  have  derived  the  steady-state  distribution  of  i  for  some  fixed  N 
in  Eqn.  3.1-5.   We  now  may  write  3.2-1  as 


3.2-2         f(x)  =  lim  N  p. 

i 

i/N=C 
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We  need  a  result  on  the  limiting  behavior  of  gamma  functions  be- 
fore we  may  proceed  with  our  derivation  of  the  limit  as  N  -*«*>  .   We 

6 
have  that 

P(a+  a)/va  (""(a)  asa->«° 

where  the  symbol  z^-s    signifies  "behaves  as."  This  result  greatly  simpli- 
fies the  limit  process  we  use  below. 

Recall  that  in  Section  3.1  we  found  that  the  equilibrium  distri- 
bution of  the  process  is 


/N\r(g+i)r(N+^'P"  i)rfo 

11  "I  M     rdbr<N+7>r£-Jb 


=  l  t)  r(S)  hn+  a±&)  r(2i£  -  a  > 

7  7  7  7 

,.na-  +  i)r(N^-  on?*) 
~\ l 1  r &  r<N+  a+£)  r& 

7  7  7 


Then 

N  r<N+l)  |~(NX+  a)f(N(l-X)+  &)r<flEffi.) 

3.2-3  f(X)    =      lim     N  p.    =      lim Z S — Z ^Jr~ 

N->oo         x        N-^a>r(NX+l)r(N(l-X)+l)r(-)r(E)r(N+ai£) 

X=(i/N)=C         NX=i=NC  7       7  7 


lim 


r(^)N2r(N)Na/yxa/yr(NX)Np/y(i-x)p/7r(N(i:x) 


) 


N^0of(a)r(§)NXr(NX)N(l-X)r(N(l-X))Na/7NP/7r(N) 
NX=i 


-69- 


f(X)  =  — „7  §   X7    (1-X)7 
7   7 


which  is  a  beta  distribution.   Hence  the  distribution  of  an  individual's 
response  probability,  X,  in  the  steady-state  and  as  N-^oO  ,  is  the 
beta  distribution.  When  we  consider  the  diffusion  limit  of  the  Cohesive 
Elements  Model  in  Section  3.4,  we  shall  see  that  3.2-3  satisfies  the 
necessary  steady-state  form  of  the  Fokker-Plank  equation. 

It  must  be  noted  that  all  that  has  been  shown  here  is  that  in  the 
steady-state  the  assumed  birth-death  process  on  the  elements  along  with 
the  pseudo-Bernoulli  assumption  generates  a  beta  distribution  on  the 
response  probability  as  the  number  of  response  elements  increases  with- 
out limit.   However,  in  view  of  the  natural  connection  between  a  beta 
distribution  of  the  Bernoulli  parameter  and  Bernoulli  trials,   it  would 
seem  even  in  this  postulated  non-stationary  Bernoulli  universe  that  an 
assumption  of  a  beta  law  as  the  distribution  of  response  probabilities 
prior  to  the  process  reaching  steady-state  would  not  be  an  unreasonable 
assumption.   In  further  defense  of  the  beta  distribution,  it  should  be 
noted  that  the  form  of  the  beta  is  quite  flexible  in  terms  of  the  dif- 
ferent empirical  distributions  it  can  represent. 

It  is  especially  convenient  to  use  the  form  of  this  limiting  dis- 
tribution as  the  form  of  the  distribution  of  X  at  all  times  t  even 
though  steady-state  has  not  yet  been  reached.  This  then  restricts  our 
problem  in  estimating  the  distribution  at  time  t  to  that  of  estimating 
the  parameters  of  the  distribution  rather  than  first  trying  to  fit  a 


-70- 

functional  form  of  this  distribution.   In  the  form  in  which  we  may 
reasonably  expect  to  obtain  data,  the  task  of  estimating  the  func- 
tional form  of  the  transient  distribution  of  X  would  prove  exceedingly 
difficult,  if  not  impossible. 

In  Section  3.6  we  shall  develop  the  first  two  raw  moments  of  the 
distribution  of  X(t)  at  any  time  t.   This  development  is  free  of  any 
assumptions  as  to  the  form  of  the  distribution.   However,  since  the 
beta  distribution  is  a  two  parameter  distribution,  the  method  developed 
in  Section  3.6  would  enable  us  to  fully  specify  the  distribution  if  we 
assume  a  beta  form.   As  noted  above,  this  does  not  seem  unreasonable 
in  view  of  the  fact  that  the  steady-state  distribution  is  a  beta  dis- 
tribution. 

3.3  Mean  Value  Function  and  the  Variance  of  the  Change  Process 

The  change  process  in  the  Cohesive  Elements  Model  may  be  thought 
of  as  a  birth-death  process  on  the  response  elements  which  is  time 
homogeneous  and  is  constrained  to  the  interval  0  to  N.   The  process 
might  be  termed  a  parentless  birth-death  process  in  that  even  if  all 
the  elements  are  associated  with  response  An  at  a  given  time  t,  there 
is  still  a  positive  probability  N  a4t  that  there  will  be  a  birth  in 
the  interval  (t,  t+4t)--i.e.,  there  is  probability  Na^lt  that  one 
of  the  N  response  elements  associated  with  response  A_  will  become  as- 
sociated with  response  A.  during  the  designated  interval.   That  is, 
state  X(t)  =  i/N  =  0  is  not  an  absorbing  state.   In  fact,  the  process 

a 

has  no  absorbing  states. 
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Recall  that  the  number  of  response  elements  associated  with  response 
1  is  denoted  by  i.   The  birth  rate  of  the  process  is  A.  =  (N-i)(a  +i  7)1 
while  the  death  rate  is  pt  =  i(p  +  (N-i)7>.   It  might  be  noted  that 
birth-death  processes  are  nearest  neighbor  systems.   That  is,  the  tran- 
sition i  to  j  where  )i-j|>  1  occurs  with  probability  zero  for  all  ad- 
missible i  and  j.  An  equivalent  way  to  express  this  property  is  to  say 
that  the  transition  matrix  of  a  birth-death  process  is  a  Jacobi  matrix. 

The  next  task  is  to  develop  relations  which  will  describe  the 
evolution  of  X(t)  through  time.  Recall  that  X(t)  =  i/N  is  the  proba- 
bility that  the  individual  will  make  response  1  at  time  t,  where  time 
t  is  a  response  occasion  or  event.  The  procedure  will  be  to  develop 
equations  for  E[x(t)J  and  Varrx(t)J  .   It  will  be  seen  that  the  variance 
and  the  raw  second  moments  of  this  process  at  any  time  t  >  0  do  not 
degenerate  as  the  number  of  response  elements  increases  without  limit. 
This  is  in  direct  contrast  to  the  Independent  Elements  Model,  the  non- 
catagious  case,  where  the  variance  tended  to  zero  as  the  number  of 
response  elements  increased  without  limit  as  was  shown  in  Section  2.3. 

Let  m(t)  =  E[X(t)J  „   In  the  parlance  of  stochastic  processes, 
m(t)  is  termed  the  "mean  value  function"  of  the  process.   See  Parzen 
(1962)  Chapter  3  for  a  discussion  of  the  mean  value  function  of  a  sto- 
chastic process.   It  is  shown  below  that  the  postulated  birth-death 

process  on  the  response  elements  generates  a  differential  equation  on 

9 
m(t)  of  the  form. 


3.3-1  d  m(t)    _    ,  w,x  _/^\ 

— ^-^   =  a  -  (a+P)  m(t) 
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Proof  of  3.3-1 

Let  h  be  a  small  increment  of  time  such  that  no  more  than  one 
transition  may  occur  in  the  interval  from  t  to  t+h.   Then  by  the  axioms 
M1-M5  the  only  possible  values  of  X(t+h)-X(t)  and  their  associated 
probabilities  of  occurrence  are  those  given  in  Table  3-1. 

TABLE  3-1 
AX(t)   CONDITIONAL  ON  X(t) 
COHESIVE  ELEMENTS  MODEL 

AX(t)  PJAX(t)|X(t)  =  |  J 

X(t+h)-X(t)  =  1/N  Xjh  =  (N-i)(a  +  i  7)h 

X(t-Hi)-X(t)  =0  1  -  (Xi  "  ^>h 

X(t+h)-X(t)  =  -1/N  Uih  =  i  h[p+(N-i)7j 

In  Table  3-1  we  recall  that  we  have  X(t)  =  i/N  where  i  is  the  number 
of  elements  associated  with  response  1.   For  h  sufficiently  small, 
there  are  no  other  possible  values  of  4X(t), 

Recall  from  probability  theory  (see,  for  example,  Parzen  (1960), 
p.  384)  that 

E[Y]  =  ^  E[Y|X  =  q   d  Fx(x)  =  EX|^EYIX[YIX  =  xfj  . 

The  integral  is  a  Rieman-Stiel jes  integral  and  thus  holds  for  both 
mass  functions  and  continuous  probability  density  functions.   Letting 
Y  =  X(t+h)-X(t)  =AX(t)  and  X  =  X(t)  in  the  above  expression,  the 
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expectation  of  X(t+h)-X(t)  may  be  expressed  as 
3.3-2    B[x(t4h)-X(t)]  =  Ex  [«^|x(t)&<W«-*<OU<t>  =  ±]| 

But  notice  that  E    ,    „  fx(t+h)-X(t)  |  X(t)(  can  be  computed  from  the 
4X|X(t)u  J 

previous  results  for  X(t+h)-X(t)  and  their  associated  probabilities. 
See  Table  3-1.  We  have 

3.3-3   EJX(t+h)-X(t)lX(t)  =  i/Nj=  |  {(N-i)a  +(N-i)i  7-ip-(N-i)i  7I 

=  ^{(N-i)a-iP}=h{(l-i)a-iP} 
-  h{a-(a+P)X(t)j 

Substituting  3.3-3  into  3.3-2  we  obtain 

3.3-4     E[x(t+h)-X(t)]  =  E  U[x(t+h)-X(t)|X(t)  =  i/N]| 

=  Ex[h(a-(o+fl)X(t)J 

=  h{a-(a+p)E[x(t)]} 

Dividing  3.3-4  by  h  and  taking  the  limit  as  h— ^0  results  in 


E[x(t-fh)-X(t)3        ECx(t+h)]-E&(t)] 
lim    r — —  =   11m  ! ■ 

h  h-»0        h 


3.3-5 

h-»0  h-*0 
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Th  e  first  equality  follows  from  the  fact  that  expectation  is  a  linear 
operator  while  the  second  equation  follows  from  the  definition  of  the 
derivative  of  a  function.   In  terms  of  the  notation  m(t)  =  E|X(t)] , 
equation  3.3-5  may  be  expressed  as 


3.3-6  ^-jP  =  a-(o+P)  m(t) 


which  is  just  equation  3.3-1.   This  completes  the  proof  of  3.3-1. 

Subsequent  development  of  the  model  will  require  knowledge  of  the 
solution  of  3.3-6.  The  solution,  as  may  be  easily  verified  by  substi- 
tution for  m(t)  in  3.3-6,  is 


3.3-7         m(t)  =  m(0)e       +  — r  (i-e       ) 


or  in  terms  of  the  expectation  operator 

3.3-8        E[x(t)]  =  E[X(0)]  e-(a4*)t  +^_  (l-e^0*^) 

where  E|X(0)J  =  m(0)  expresses  the  initial  conditions  in  the  above  equa- 
tions . 

Suppose  that  an  individual  has  probability  X(0)  of  making  response 
A1  at  time  0.   Then  given  this  probability,  one  can  express  e[x(0)J  =  X(0) 
since  X(0)  is  now  a  degenerate  random  variable  with  the  following  proba- 
bility density  function: 
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P(X(t=0)  =  X(0))  =  1 

P(X(t=0)  =  any  other  value)  =  0 

Note  that  in  this  case  E^X  (0)J  =  X  (0)  and  hence 

Var[x(0)J  =  e[x  (0)]  -  E  \x(0)J  =  X  (0)  -  X  (0)  =  0,   These  results  will 
be  of  particular  importance  when  individuals  are  aggregated  to  obtain 
equations  for  estimating  the  change  process  coefficients,  a  and  p. 
One  may  now  rewrite  3.3-8  for  an  individual,  given  that  indi- 
vidual's initial  probability  of  making  response  1,  X(0).  That  is, 

3.3-8a      E[x(t)|x(0)]  =  x(0)e" (a+P) '  +  ^-  (1-e"^*) 

It  remains  to  develop  VarJX(t)J  and  to  show  that  this  variance  does 
not  degenerate--i.e. ,  go  to  zero--as  the  number  of  elements  increases 
indefinitely.   It  is  well  known  that 

3.3-9        Var[x(t)J  =  e[x  (t)J  -  E  jx(t)]  . 

The  value  of  E\X(t)J  is  given  in  equation  3.3-8.   Hence  if  it  is  pos- 
sible to  solve  for  E[X  (t)J  ,  then  VarlX(t)J  may  be  computed.   Consider 

3.3-10    e[x  (t+h)J  =  E][x(t+h)-X(t)+X(t)} 

=  E[x2(t)+X(t)[x(t+h)-X(t)}+jx(t4h)-X(t)}2  J 

=   E[x2(t)]  +  E[fx(t4h)-X(t)}2J  +2E[x(t){x(t-Hi)-X(t)}] 
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First  take 


3.3-11    EJjk(t+h)-X(t)}2J  =  Ex  EAX|x[{x(t4h)-X(t)^2|X(t)] 


Once  again  let  h  be  a  small  increment  of  time  such  that  no  more  than 
one  element  may  make  a  transition  (i.e.,  become  associated  with  the 
opposite  state)  in  the  interval  t  to  t+h.  Then  the  only  possible 
values  of  VX(t+h)-X(t) "j  given  X(t)  which  may  occur  with  non-zero 
probability  are  those  given  in  Table  3-2. 


TABLE  3-2 


JAX(tj]         CONDITIONAL  ON  X(t) 
COHESIVE  ELEMENTS   MODEL 


[Ax(t)]2 

X(t+h)-X(t)]      =  1/N 
[x(t+h)-X(t)]2   =  0 
[x(t+h)-X(t)]2   =      1/N2 


p[[Ax(t)}2|x(t)  =  i/Nj 
^  h  =  (N-i)(a+i  y)h 

1   "    (\   +  H^h 
Uih=    fp+<N-i)7j  hi 


Using   the   results   in  Table   3-2  we  find 

[x(t+h)-X(t)}2Jx(t)J  „  -^  {(N-i)(a+i7)+ifp+(N-i)7]| 


3.3-12 


N 


)[|  +  7d-X(t))f| 


x(t) 
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ERX(t+h)-X(t)}2|X(t)J  =  £ja(l-x(t))+p  X(t)j+  h7|x(t)-x2(t)+x(t)-x2(t)j 

=  ^{a-(a-p)x(t)j+2  h  7^x(t)-x2(t)j 

Inserting  3.3-12  into  3.3-11  yields 

3.3-13     EJ[x(t+h)-X(t)}2j  .  ^{a-(a-p)E{x(t)]|+2h7[E[x(t)]  -E[x2(t)]| 


Finally,  examine  the  last  term  on  the  right  hand  side  of  3.3-10,  that  is 
EJjx(t4h)-X(t)}x(t)J.   Consider  first  EJjx(t-fh)-X(t)}x(t)|  X(t)j  .  The 
possible  values  of  X(t)jjdX(t)J  and  their  associated  probabilities  con- 
ditional upon  X(t)  =  i/N  are  given  in  Table  3-3. 

TABLE  3-3 

X(t)[^X(t)]]      CONDITIONAL  ON  X(t) 

COHESIVE  ELEMENTS   MODEL 

X(t)[AX(t)]  p[x(t)fiX(t)J|x(t)=i/N] 

{x(t+h)-X(t)]x(t)=  Si£i  ^h   =    (N-i)(a+i7)h 

[x(t+h)-x(t)]x(t)=    o  i  -  (At  +nt)h 

jx(t+h)-X(t)]x(t)=  ^^-  ^.h  =   i[p+(N-i)7]  h 

Using  the  results  given  in  Table  3-3,  we  find 

EJjx(t4*)-X(t)}x(t)|x(t)]=  ^^-[(N-i)(a+i7)-ifp+(N-i)7]] 
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E[(x(t4h)-X(t)]x(t)|X(t)]=  hX(t)jjl-X(t)](a+i7)-P  x(t)-  [i-x(t)]i7J- 

=  hX(t)|fl-x(t)Ja  +  [l-x(t)]i7  -{l-x(t)Ji7 

-  PX(t)J 

=  hX(t){a  -(a+p)  x(t)} 
=  h{aX(t)-(a-^)X2(t)} 


Hence 


3.3-14    E|jx<t4h)-X(t)}x(t)]=  hja  E[x(t)]-  (a+P)E^2(t)]  I 

Using  3.3-13  and  3.3-14,  we  have  for  3.3-10  that 
E[x2(t+h)j=  E[x2(t)]+ jj{a-(a-B)E[x(t)]j+  2h7[EJX(t)]-E[x2(t)]J 
+  2h["aE[x(t)j  -(a-^)E[x2(t)]| 


Now   take 
2 


Efx   (t4h)l-E|x   (t)7  _      -(  _  a  ,       _r  . 
C      L         J    -  S  -«M{8  -  j  -  2?  -  2c}-E[x2(t,j{2?+2a+2P} 


Then, 

2 

( 

dt~ 


3.3-15  n      E[x2(t-*)]-E[x2(t)|         d  E[x2(t)] 

lim  — =  


h-»0 


=  N  "(N  '  N)EtX(t)J+2(Crf7)E^X(t)l'2(a"^+7)Efx2(t)7 
Once  again  let  e[x  (t)J  denote   lim  e[x  (t)J. 


ELJ-2 

N^*e 
Then  equation  3.3-15  becomes 
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3.3-16   lu  dE^^  =  dEj2iiti=2(a+r)j[x(t)].2(aif+7)i[x2(t)1 


dt         dt 


Equation  3.3-16  may  be  solved  by  the  method  of  undetermined  coefficients, 
Assume  the  trial  solution 


-(a+£)t      -2(a+£+r)t 


3.3-17        i[x  (t)]  =  aQ  +  aL  e"(a+P)t  +  a£  e 


where  a_,   a    ,   and   a     are  arbitrary  and  as   yet  undetermined  coefficients, 
Differentiating  3.3-17  we   find 

3.3-18  di^ti  =  -(a+P)a1e-^>t-2(a+P+7)a2e-2to+P^>t 


Then  sustitute  the  trial   solution  3.3-17   into  3.3-16  in  conjunction  with 
3.3-8.      This  yields 


a+p 
•2(a+P+7){ao4.ie-^>t+a2e-2<^^t 


3'3"19       filj^Ql  =  2(a+7)[i[x(0)]  e"^^  +  JL  d-.-W)^ 

-2(a^47)a0-2(a^+y)a1e"<0H*)t:-2(c^47)a2e"2(o^47)t 

-2(a^)a1]e-^)t-2(a+P+7)a2e-2^^t 
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By  equating  coefficients  of  similar  terms  in  3.3-18  and  3.3-19  and 
noting  the  initial  condition  E[X  (0)J  we  have  three  independent  equa- 
tions for  the  three  unknown  coefficients.  These  equations  are 


0  "  ^  -2«^>ao 


3.3-20 


"(a"H*)ai  -  ^^J  {(Q*P)E  X(0)  .aj-2(cr^+7)a1 
i[x  (0)]  =  aQ  +  ai  +  a2 


The  solution  of  3.3-20  is  clearly 


q(q+y) 


0       (a+P)  (a+P+7) 

2(a+7){(a+P)iOc(0)]-a] 


a,    = 


3.3-21 


1  (a+p)[a+P+27} 

a2  =  i[x   (0)]  -  aQ   -   ai 

;rx2/ox"|  a(a+y)         .  2fa+7)  }(crfg)Efx(0:Q-a) 


(a-f^)(a+p+7)       <a+p)         {a+p+27} 


Hence  we  have  as    the  solution  of   3.3-16 


3'3"22      ETx2m!  =      a(a+7)  +  2(ot+y)   Ua^nrx(O)] -g]  e-(a+p)t 

L     *'•*•■  (crfp)(a^+7)         <a+p)         [  a+p+27j 

+  f;rx2r0,l  a(a+y)  2(a+y)    (a-H»i[x(0)]  -q 

i    L     WJ      (a+p)(a+p+7)         (a+p)       [a+p+27] 
-2 (a+P+7) t 
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Using  3.3-22  and  3.3-8  we  may  compute  the  variance  of  the  response  proba- 
bility at  any  time  t.  After  some  tedious  algebra  we  have 

3.3-23 

Va7[x(t)]    =  i[x2(t)]-   E2[x(t)] 

2 +  (a+P)(a+P+27)  a-HJ       Elx(°>Jje 


(crfp)   (crfp+7)       (a4fl)(a*fH27)(o^ 

^2&(0)l-^)^(0)]+(^>3 


(a+y)  .  2(q+y)    ftoHOKfrfOYl -A-2 fartfrfry) t 

(a+p)(a+p+7)     (a+p)   Ca+p+2rJ    J 


-2  (ctfp)  t 
e 


In  general  Var[x(t)J>0  for  all  tiO  as  given  in  3.3-23  for  the  infinite 
number  of  elements  case.   This  is  in  direct  contrast  to  the  Independent 
Elements  Model  in  which  the  variance  of  the  process  was  zero  when  N— ^<» . 
In  the  case  of  the  Cohesive  Elements  Model  we  have  a  true  stochastic 
change  process  operating  on  the  state  space  of  response  probabilities. 

It  is  clear  from  3.3-23  that  the  variance  in  the  process  in  the 
steady-state  is 

3.3-24  v^7[x]  =  SfZ 

(a+P)  (a+P+r) 

This  is  just  the  variance  of  the  steady-state  distribution  of  response 
probability  for  the  infinite  element  form  of  the  Cohesive  Elements  Model. 
It  can  be  shown  that  this  variance  is  identical  to  that  of  the  beta 
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distribution  derived  in  Section  3.2. 

3.4  The  Diffusion  Limit  of  the  Process 

Suppose  that  X(t)  is  a  random  variable  which  depends  upon  the  con- 
tinuous time  parameter,  t.   Further  suppose  that  this  random  variable 
is  of  the  Markov  type,  that  is,  future  changes  depend  upon  the  present 
state  but  not  upon  the  realized  history  of  outcomes  which  led  to  this 
state.   Now  the  stochastic  process  £X(t)j  t>0  j  may  be  of  two  types: 
discontinuous  or  diffusion.   Discontinuous  processes  are  characterized 
by  the  fact  that  the  probability  of  a  change  in  the  interval  (t,  t+h) 
where  h  is  very  small  is  of  the  order  of  magnitude  of  h  (i.e.,  very 
small).   However,  in  a  discontinuous  process,  when  a  change  does  occur, 
it  is  finite  in  magnitude.   In  contrast  to  processes  of  the  discon- 
tinuous type  are  diffusion  processes  in  which  X(t)  changes  continuously. 
In  diffusion  processes,  no  matter  how  small  the  interval  (t,t+h),  X(t) 
will  undergo  some  change.  This  change  is  practically  certain  to  be 
small  for  small  intervals.   In  formal  terms  this  says  that 

p{)x(t+h)-X(t)|>/f<0(h) 

where  again  the  symbol  0(h)  stands  for  "of  the  order  of  the  argument," 
which  in  this  case  is  h. 

For  a  diffusion  process  the  infinitesimal  mean  displacement  a(X)  and 
the  infinitesimal  variance  2b(X)  play  a  fundamental  role.   These  quanti- 
ties are  defined  as  follows: 
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3.4-1       lim  nx^t^)-X(t))X(t)l  _   a(x) 
h-*0        h 


3.4-2       lim  Var[x(t+h)-X(t)lx(t)J 

h-*0  h  ~   W 


Let  f(t,X)  denote  the  probability  density  function  of  X(t). 

11 
Kolmogorov   has  shown  that  f(t,X)  must  satisfy  the  Fokker-Planck 

diffusion  equation 


3'4~3        ^f(t,X)  _  a2lb(X)f(t,X)l   ^a(X)f(t,X)] 
3  C       ^  x2  ^>  X 

where  a(X)  and  b(X)  are  the  quantities  defined  in  3.4-1  and  3.4-2.  For 
excellent  references  on  diffusion  processes  and  continuous  time  sto- 
chastic processes,  see  Feller  (1950)  and  Bharuchi-Reid  (1960). 

The  Cohesive  Elements  Model  may  be  represented  by  a  diffusion  pro- 
cess as  the  number  of  response  elements,  N,  increases  without  limit. 
In  this  case  each  individual  in  the  population  has  an  infinite  number 
of  response  elements  undergoing  the  change  postulated  in  the  assumed 
process.   Recall  that  in  Section  3.2  it  was  shown  that  the  process 
generates  a  steady-state  distribution  of  response  probabilities  which 
follows  a  beta  law.  The  beta  distribution  derived  in  Section  3.2  must 
satisfy  the  Fokker-Plank  diffusion  equation  3.4-3  in  the  steady-state, 

that  is  when  — r — *—*-   =  0.  The  remainder  of  this  section  will  consider 
a  c 

the  derivation  of  the  infinitesimal  mean  displacement  and  variance 
of  the  process  and  demonstrate  that  the  beta  law  previously  derived 
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does  indeed  satisfy  the  diffusion  equation  of  the  process  in  the  steady- 
state. 

First,  consider  the  infinitesimal  mean  displacement,  a(X).   Recall 
that  in  3.3-3  it  was  shown  that 

3.4-4       E[x(t+h)-X(t)|X(t)  =  1/nJ  =  h[a-(a40)X(t)J 

From  this  it  is  easily  seen  that 

3'4~5     »j   E[x(t+h)-X(t))x(t)=i/N]     , 

tlm  -»■-> '- — ^ ^ — '- —  =  cr(artfl)X(t)  =  a(X) 

h-»0  n 

Hence  the  infinitesimal  mean  displacement  has  the  same  form  as  the 
previously  derived  differential  equation  on  the  mean  value  function 
of  the  response  probability,  X(t).  Note  that  3.4-5  shows  that  the 
mild  regularity  assumption  implicit  in  3.4-1  is  met  in  this  model. 

It  remains  to  consider  the  infinitesimal  variance.  The  variance 
may  be  expressed  as  follows: 

3.4-6       Var[x(t+h)-X(t)|X(t)]  =  e]  [x(t-Hi)-X(t)]2|X(t)j 

-  E2[x(t+h)-X(t)|x(t)] 

The  second  term  on  the  right  hand  side  of  3.4-6  is  just  the  square  of 
equation  3.4-4  given  above.   That  is, 

3.4-7   E2[x(t-th)-x(t)|x(t)7=  (h[*a-(a+p)x(t)]j2 

2r      .    .12 


=  h  [a-(cd-p)x(t)J 
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There  is  no  need  to  expand  3.4-7  further  since  it  will  be  seen  below 
that  it  vanishes  in  the  limit  as  h  -$►  0. 

In  order  to  develop  the  infinitesimal  variance  it  remains  to  con- 
sider E  $X(t+h)-X(t)}  |X(t)J  .   In  Section  3.3  we  found  this  value  as 
3.3-12.   This  equation  is  reproduced  as  3.4-8  in  this  section  for  con- 
venience of  discussion. 

3.4-8   EJjx(t+h)-X(t)}2|X(t)]  =  jj[[l-X(t)]crfX(t)pj  +  2h7[x(t)-X2(t)] 

Now  the  diffusion  process  holds  exactly  when  N->«*>and  hence  when  the 
state  space  of  the  Markov  process  becomes  infinite.   In  this  case  the 
response  probability  at  time  t  becomes  a  continuous  random  variable. 
Let  ii(x(t+h)-X(t)]  |X(t)J  denote  the  value  of  3.4-8  as  N->«,   Then, 

3.4-9   EJjx(t+h)-X(t)]2|X(t)]=  2hy[x(t)-X2(t)J 

This  is  the  value  of  E]^X(t)}  |x(t)7  to  use  in  equation  3.4-6.   Then 
substituting  3.4-9  and  3.4-7  into  3.4-6  we  obtain 

3.4-10 

Var[x(t+h)-X(t)fX(t)]=  2h7[x(t)  -X2(t)]  -h2[a-  (a+P)X(t)]  2 

Hence 

2b (X)  =   llm  V»rx<t-fh)-Xtt)lx<tfl 

h^O         h 

=  27[x(t)-X2(t)] 

In  summary,  the  infinitesimal  mean  displacement  and  variance  of  the 
process  are 
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3.4-11        a(X)  =  a-(a-H3)X(t) 

3.4-12         2b(X)  =  27[x(t)-X  (t)J 

Now  letting  f(t,X)  denote  the  probability  density  function  of 
X(t),  the  above  discussion  has  indicated  that  in  the  steady-state  f(t,X) 
must  satisfy  the  following  form  of  the  Fokker-Plank  equation; 

3  4_13       0   ^MX)f(x)i  .  Ha(X)f{X)J 

3  x2        a  X 

where  the  time  variable  t  has  been  dropped  since  only  steady-state  con- 
ditions are  being  considered. 

Recall  that  in  Section  3.2  it  was  shown  that  if  N  — ^  <=o  in  Coleman's 
contagious  binomial  distribution  that  the  distribution  of  the  response 
probability,  X(t),  will  be 

3.4-14  n9^)   -  - 1      £  - 1 

f(X)  =     7  X7     (1-X)7 

r<f>nf) 

in  the  steady-state.   That  3.4-14  and  the  contagious  binomial  distri- 
bution are  steady-state  distributions  follows  from  the  fact  that  the 

contagious  binomial  was  derived  when  — *-*■  =  0  in  Section  3.1.   It  is 

dt 

shown  below  that  3.4-14  satisfies  3.4-13.   This  result  further  rein- 
forces the  view  of  the  Cohesive  Elements  Model  as  a  probability  dif- 
fusion model  when  the  number  of  response  elements  increases  without 
limit. 
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In  order  to  simplify  notation,  let 


K-r^/nfrrA 

7  /       7        7 


A  =  ah 
B   -   P/7 


Then  b(X)f(X)    =  K9'(X-X2)XA"1(1-X)B"1   =  K7XA(1-X)B 
and 

a(X)f(X)   =  [a-(a+6)x"JK  X  "   (1-X)B 


=  aKXA"1(l-x)B"1  -   (a+fi)KXA(l-X)B~ 


Taking   the   indicated  derivatives,   we  find 

khSmS^l      .   K7AXA-1(l-X)B-K,XAB(l-X)B-1 
a  X 

and   then 

3.4-15     ^)2{b(X)f(X)}  ..2  B  A-1  B-1 

2 -   K/ACA-l)^      (1-X)    -K7ABXA      (1-X)B 

d  x 

-K7ABX   "    (1-X)    "   +K7B(B-1)X    (1-X) 


A  —  9  R  —  9  9  97 

=   K7X   "    (1-X)    "    (7A(A-1)(1-X)    -27ABX(1-X)+7B(B-1)X    j 

=  KXA_2(l-x)B-2fa(2  -l)-2a<a  -l)X:4a(s  -DX2 

(.7  7  7 

-2aAx+2a(B-)x246(^  -l)x2] 
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2 
X 


^X^QOL  .  n»-2(i-x)»-2$a<«  -l)-2a(?  ♦*  -l>x+J±faH.>2-fa*>~ 

)  X  7        7    7       (_7  J 


Now  differentiating  a(X)f(X)  with  respect  to  X,  one  finds 


3-4-16     Maiafffil .  aK(«-i)x4-2(i.x)»-1.Q«.-i)xA'1(M)»-2 

-(o+P)KAXA"1(l-X)B'1+(a+6)K(B-l)XA(l-X)B'2 

=  KXA"2(i-x)B"2{a(A-l)(l-x)-a(B-l)x-(a-^)AX(l-x) 

+(a+6)(B-l)X  } 

KXA-2(l-x)B-2fa(s 

7  7        7 


(l-x)B'2fa(S-D-2a(^+^-l)X 
+j-(a-^)2-(a+p)Jx2 


Clearly  3.4-15  equals  3.4-16  and  hence  the  beta  probability  density 
function  derived  in  Section  3.2  satisfies  the  diffusion  equation  of 
the  process. 

3. 5  Aggregation 

Thus  far,  consideration  has  been  given  to  the  process  at  the  level 
of  the  individual  respondent.   In  order  to  obtain  estimating  equations 
for  the  model,  it  is  necessary  to  aggregate  responses  over  time  for  a 
given  respondent  or  to  aggregate  cross-sectionally  in  time  over  indi- 
vidual respondents  who  behave  according  to  the  same  process--i.e. , 
have  the  same  a,  0,  and  y.      Aggregation  over  time  for  a  single  individual 
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would  require  that  a  very  long  sequence  of  responses  be  available  for 

12 
that  individual   and  that  the  process  remain  stationary  over  this 

long  time  span.   That  latter  requirement  indicates  that  a,    $,   and  y 

must  remain  constant  over  a  considerable  expanse  of  time.  While 

this  longitudinal  aggregation  over  a  single  respondent  may  prove 

useful  in  certain  experimental  applications  in  the  behavioral  sciences, 

marketing  applications  of  this  model  will  nearly  always  preclude  this 

method  of  aggregation. 

The  cross-sectional  method  of  aggregation  over  respondents  as- 
sumed to  behave  according  to  an  identical  process  is  the  preferable 
procedure—at  least  for  marketing  applications—on  two  counts.   In  the 
first  place,  marketing  managers  are  generally  more  interested  in  the 
behavior  of  the  aggregate  of  individuals  who  make  up  the  market  place 
than  they  are  in  the  behavior  of  one  particular  individual  or  house- 
hold.  Hence  the  cross-sectional  method  is  preferable  to  the  longi- 
tudinal method  of  aggregation  on  this  count.   Secondly,  the  cross- 
sectional  method  only  requires  short  run  stationarity  of  the  latent 
Markov  process.   In  a  marketing  context  long  run  stationarity  of  the 
process  is  clearly  implausible  in  view  of  the  dynamic  nature  of  the 
market  place.  Thus,  once  again,  the  cross-sectional  method  of  aggrega- 
tion appears  to  be  the  preferable  procedure  in  the  marketing  context. 

Perhaps  it  is  well  to  note  just  what  we  mean  when  we  say  that  all 
individuals  are  assumed  to  behave  in  accordance  with  the  same  process. 
What  we  mean  by  this  is  that  the  parameters  of  the  change  process- 
or 0>  and  7— are  the  same  for  each  individual  respondent  in  the 
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population.  This  is  not  at  all  the  same  thing  as  saying  that  all  indi- 
viduals have  the  same  response  probability.   In  fact,  the  Cohesive  Ele- 
ments Model  allows  for  heterogeneity  among  respondents  with  respect  to 
their  response  probabilities,  X(t),  even  after  the  process  has  reached 
statistical  equilibrium  or  the  steady-state.  This  point  is  clear  from 
the  development  in  Section  3.3.   In  contrast  to  historical  applications 
of  the  linear  learning  model  to  consumer  brand  choice  behavior,  the 

latent  Markov  process  discussed  in  this  chapter  does  not  require  that 

13 

all  individuals  have  the  same  initial  probability  of  making  response  A, - 

In  the  remainder  of  this  section  we  shall  utilize  the  following 

notation: 

M       =  number  of  individual  respondents 

N       =  number  of  response  elements  within  each  individual 
respondent 

N       =  the  number  of  response  elements  within  individual  k 
which  are  associated  with  response  A. >   k=l,2,.„.,M 

Nlk 
X  (t)=   -  the  probability  that  individual  k  having  N   elements 

out  of  N  associated  with  response  A.  at  time  t  will 

make  response  A1  at  time  t. 

frx(t)j   =  the  probability  density  function  of  the  population  of 

individuals  with  respect  to  their  response  probability, 
X(t),  at  time  t.  That  is,  ffx(tX]  is  just  the  distri- 
bution of  the  response  probability  in  the  population. 

In  this  section  we  shall  have  no  need  to  further  consider  the  form  of 
f [x(t)J  •  When  we  turn  to  the  problem  of  estimating  f[x(t)J  in  Section 
3.6,  we  shall  then  examine  the  beta  form  mentioned  in  Section  3.2,  Sec- 
tion 3.3,  and  Section  3.4.   In  any  case,  the  results  of  this  section 
are  free  of  any  assumption  as  to  the  form  of  f£x(t)J. 
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In  Section  3.3  the  mean  value  function  of  the  stochastic  process 
fx(t)j  t?0]  was  investigated.   If  for  any  individual  respondent  only 
the  expectation  of  his  initial  response  probability  is  known,  then  one 
has 

3.5-1      E[X(t)]  =  E&(0)]e-<o^)t  +  -Jj  (1-e"^') 

But  if  we  wish  to  follow  the  evolution  of  an  individual  given  some 
initial  probability  X(0),  we  have  that  E^X(0)3  =  X(0)  and  the  above 
relation  becomes 

3.5-2    Efx(t)]  =  X(0)e'(a+P)t  + -SL  (l-e"(a4^)t:)  given  X(0)  . 
L   J  a+8 

Note  in  3.5-1  and  3.5-2  that  knowledge  of  the  mean  value  function  de- 
pends upon  knowledge  of  the  initial  state  of  the  individual.   In  es- 
sence then,  once  e[x(0)J  in  3.5-1  and  X(0)  in  3.5-2  are  known,  one 
then  knows  Efx(t)~]  assuming  that  the  process  parameters  are  known. 
Hence  3.5-2  might  be  more  appropriately  written  as 

3.5-3      E[x(t)|X(0)]  =  X(0)e-(a+^)t+^  (l-e^4^) 


In  the  Cohesive  Elements  Model  the  data  form  a  sequence  of  O's  and  l's 
where  a  1  denotes  the  occurrence  of  response  A.  and  a  0  denotes  the 
occurrence  of  response  An-   In  a  marketing  context  such  a  sequence 
might  represent  the  sequence  of  brands  purchased.   Response  A.  would 
be  the  purchase  of  the  brand  of  primary  concern  in  the  study  while  An 
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would  be  the  purchase  of  any  other  brand. 

From  the  data  one  may  tabulate  the  proportion  of  individuals  who 
made  response  1  at  time  t  and  the  proportion  of  individuals  who,  having 
made  response  1  at  time  0  again  make  response  1  at  time  t.  These  em- 
pirical proportions  will  be  denoted  by  Q(t)  and  Q(0,t),  respectively. 
In  Lemma  1  and  Lemma  2  below  it  is  shown  that  Q(t)  and  Q(0,t)  are  un- 
biased estimates  of  P(t)  and  P(0,t)  which  are  the  theoretical  expected 
response  proportions.  The  objective  at  this  point  is  to  see  whether 
or  not  it  might  be  possible  to  relate  the  observed  proportions  to  the 
latent  Markov  process  in  such  a  way  as  to  obtain  estimates  of  the 
parameters  of  the  Cohesive  Elements  Model.  The  relations  established 
in  the  two  lemmas  proved  below  will  provide  the  foundation  for  the 
derivation  of  estimating  equations. 

Let  M  be  the  total  number  of  individual  respondents  in  the  popu- 
lation. Also  denote  E[k(t)|X(0)]  by  m(tJO). 

Lemma  1.   If  P(t)  denotes  the  expected  proportion  of  the  M  respondents 
making  response  A,  at  time  t,    then 

I 

3.5-4     P(t)  =  E[x(t)]  =  ^m(t|0)d  Fx  0)(X(0)) 

o 

and  the  observed  proportion  Q(t)  is  an  unbiased  estimate  of  P(t)=E[X(t)]  . 
Proof  of  Lemma  1 

Note  that  the  integral  in  3.5-4  is  a  Rieman-Stieljes  integral  and 
thus  is  valid  whether  X(t)  is  a  discrete  or  a  continuous  random  variable. 
Since  our  interest  has  centered  upon  the  infinite  element  form  of  the 
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Cohesive  Elements  Model  in  which  case  X(t)  is  continuous ,   we  shall 
treat  f  X(t)   as  continuous  in  the  proof  of  this  lemma  and  also  in 

Lemma  2. 

14 
We  first  note  that  we  know  from  probability  theory   that 

CO 


e[y]=  5E[y'X  "  x  Jfx(x)dx. 


_QO 


Hence 


1 
3.5-5         EJX(t)J  =  5  E[x(t)|X(0)]f[x(0)]dX(0) 

0 


1 
=  5  m(t|0)  f[k(0)JdX(U) 
0 


Note  that  the  right  hand  side  of  3.5-5  is  identical  to  the  right 
hand  side  of  3.5-4. 

We  now  show  that 

3.5-6  E[Q(t)J  =  P(t)  =  E[x(t)] 

which  along  with  3.5-5  will  establish  Lemma  1.   Consider  any 
individual  k.   At  time  t  he  has  probability  X  (t)  of  making 
response  A1 •   Now  let 
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r  =  0   if  individual  k  makes  response  A„  at  time  t 
k  0 


=1   if  individual  k  makes  response  A1  at  time  t 

It  is  easily  seen  that  Efr,"]  =  xk(c).  We  now  wish  to  aggregate  aero 
all  M  of  the  individuals  in  the  population.   In  this  case 


ss 


]I     rk  =  I     Efr  J-  £     Xk(0  =ME[x(t)] 
tk=l  k-i   k=l    k    k=l  k 


is  the  expected  number  who  make  response  A,  at  time  t,   The  expected 


proportion  clearly  is  just 


fir, 


fQCO]  - 


k=l 


M 


M 


M    -i 


k=l 


E(x(t)]  =  P(t). 


The  latter  equality  follows  by  virtue  of  the  definition  of  P(t)  as  the 
expected  proportion  making  response  A.  at  time  t--i.e.,  EJQ(t)J  .  This 
established  3.5-6  which  along  with  3.5-5  completes  the  proof  of  the  lemma, 

Lemma  2   If  P(0,t)  denotes  the  expected  proportion  of  the  M  respondents 
making  response  A,  at  time  0  and  A  again  at  time  t,  then 


3.5-7 


P(0,t)  =  E[x(0,t)J  =  S  m(tfO)X(0)  f{x(0)JdX(0) 


and  the  observed  proportion  Q(0,t)  is  an  unbiased  estimate  of 
P(0,t)  =  E[x(0,t)J, 
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Proof  of  Lemma  2 

15 
Using  certain  results  from  probability  theory   and  the  fact  that 

X(0,t)  =  X(0)X(t)  by  the  Pseudo-Bernoulli  Trials  Axiom  in  Chapter  II, 
we  have 

1 
3.5-8     E[x(0,t)]  =  EJX(0)X(t)]  .  5  E[x(0)X(t)  |x(0)]  f  [x(0)]dX(0) 

1 
=  5  E[x(t)|X(0)]x(0)f]x(OX|dX(0) 
0 

1 

=  S  m(t|0)X(0)f[x(0)JdX(0) 
0 


Note  that  the  right  hand  side  of  3.5-8  is  just  the  right  hand  side  of 
3.5-7. 

It  remains  to  show  that 

3.5-9  E[Q(0,t)J  =  P(0,t)  =  E\x(0,t)] 

Consider  any  individual  k.   His  probability  of  making  response  A  at 

times  0  and  t  are  X  (°)  and  X,  (t).   Recall  that  under  the  assumptions 

k         k 

of  the  model,  successive  responses  for  a  given  individual  are  inde- 
pendent.  That  is,  if 

r,  (t)  =0   if  individual  k  makes  response  A-  at  time  t 
1   if  individual  k  makes  response  A.  at  time  t  , 

then  the  independence  of  the  trials  (responses)  implies  that 
P[rk(e)|rk(0)^pfk(t)]. 
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Hence  the  probability  of  the  compount  event  r.  (0)*.  (t)  =  1*1  =  1  is 

iust  X  (0)X  (t)  a  X  (0,t).   Now  we  may  view  the  occurrence  or  non- 
J     k    k       k 

occurrence  of  the  compound  event  rk(0)r  (t)  =  1  as  a  Bernoulli  process 
with  Bernoulli  parameter  xk(0,t).   It  is  well  known  that  for  a  single 
Bernoulli  trial  we  have 

E[rk(0)rk(t)  =  ^V0'^  -  V0'0, 

Now  for  a  population  of  M  independent  individuals  we  have 
E[k|L  rk(°)rk(t)]  =  £  ■& k^V'O  "  £  V0'^ 

=  M  E(k(0,t)J 

The  expected  proportion  clearly  is 

r,     M         -,   ,  ,-  M 
E[Q(0,t)J  =  E 


This  establishes  3.5-9  which  together  with  3.5-8  establishes  Lemma  2. 

We  now  proceed  to  the  development  of  the  estimating  equations 
using  these  two  lemmas  and  the  results  of  Section  3.3.   Using  3.5-3 
in  3.5-4  we  obtain 

1 
3.5-10    P(t)  =  Sm(t/0)f[x(0)]dX(0) 

0 
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P(t)  =S(x(0)e-^>t  +£-   (l.e"<atP>t)}f[x(0)IdX(0) 


=  e'(a^)<:  5  X(0)f[x(0)]dX(0)  -f-J-  (l-e-<a4*>t)  ^  f[x(OfldX(0) 

o  a"^  0 

-P(0)«-to,»)t+i(i--toHi>t) 


Similarly,  if  we  use  3.5-3  in  3.5-7,  we  will  find 

3.5-11     P(0,t)  =  P(0,0)e"a+P)t  +  P(0)  JL[l-.-<^>tJ 


It  is  necessary  at  this  point  to  consider  what  we  mean  by  P(0,0),   It 

may  be  thought  of  in  two  ways.   The  first  is  to  note  that  by  reasoning 

2 
analogous  to  the  proofs  of  the  lemmas  we  have  P(0,0)  =  e(x  (0)1. 

Hence  it  may  be  considered  the  second  raw  moment  of  the  initial  dis- 
tribution of  response  probability  across  the  population.  An  alternative 
but  related  view  is  to  consider  p(0,0)  as  the  expected  proportion 
making  response  A,  on  two  successive  occasions  when  there  is  no  change 
in  probability  on  the  part  of  any  of  the  M  individuals  between  these 
two  response  occasions. 

A  sequence  of  two  responses  in  a  situation  in  which  the  response 
probabilities  do  not  change  enables  one  to  obtain  information  relative 
to  the  distribution  of  response  probability  in  the  population.   In 
terms  of  the  present  model  P(0,0)  represents  a  hypothetical  replica- 
tion of  the  responses  at  time  0  when  there  has  been  no  intervening 
change  in  response  probabilities.   Clearly,  this  is  not  an  observable 
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outcome  in  this  case  since  the  response  probabilities  are  assumed  to 

change  between  successive  responses.   Fortunately  P(0,0)  may  be  esti- 

16 
mated  along  with  a  and  B.    Since  P(0,0)  may  be  considered  the  raw 

second  moment  of  the  initial  distribution  of  response  probability  in 
the  population,  having  estimates  of  P(0)  and  P(0,0)  will  enable  us 
to  estimate  the  mean  and  variance  of  the  initial  distribution  of  re- 
sponse probability  irrespective  of  the  functional  form  of  this  distri- 
bution. 

3.6  Estimating  the  Cross-Sectional  Distribution  of  Response  Proba- 
bility 

As  we  discussed  in  Section  2.1,  an  individual's  probability  of 
purchasing  Brand  1  (making  response  A.)  is  a  most  appealing  measure 
of  brand  loyalty  to  Brand  1.   It  is  of  obvious  interest  to  the  mar- 
keting manager,  whose  interest  centers  on  the  behavior  of  aggregates 
of  individuals  in  the  market  place,  to  ascertain  the  distribution  of 
response  probability  across  the  population  at  any  time  t.   In  our 
previous  notation,  he  would  like  to  know  something  about  ffx(t)]j  which 
might  alternatively  be  termed  the  distribution  of  brand  loyalty.   In 
the  remainder  of  this  section  we  shall  consider  methods  for  obtaining 
estimates  of  the  first  two  raw  moments  of  f[x(t)3  and  then  discuss 
the  possibility  of  using  these  moments  to  estimate  f[k(t)J  as  a  beta 
probability  law. 

In  the  previous  section  we  had  in  3.5-5  that 

1 
3.6-1  P(t)  =  E[x(t)]  =S   X(t)f[X(t)JdX(t), 

0 


-99- 

where  P(t)  is  the  first  raw  moment  of  the  distribution  of  response  proba- 
bility at  time  t.   In  a  manner  analogous  to  the  proof  of  3.5-8  we 
could  also  show  that 

3.6-2       P(t,t)  =  E[x  (t)1  =^X  (t)f[x(t)]dX(t) 

0 

which  is  the  second  raw  moment  of  the  response  probability  distribution 
at  time  t.   Now  P(t)  may  be  estimated  by  the  observed  proportion,  Q(t), 
making  response  A,  at  time  t.   However,  since  we  have  postulated  a 
model  which  is  non-stationary  with  respect  to  the  response  probabili- 
ties of  the  individuals  in  the  population,  there  is  no  observable 
response  proportion  which  we  may  use  to  estimate  3.6-2.   Another  way 
to  see  that  this  is  an  unobservable  outcome  is  to  note  that  P(t,t)  is 
the  expected  proportion  making  response  A.  on  two  successive  response 
occasions  when  each  individual  in  the  population  experiences  no  change 
in  his  response  probability  between  these  responses.   Fortunately,  if 
we  have  responses  for  times  0,  1,...,T,  and  if  we  know  or  have  estimates 
of  a   and  p,  we  are  able  to  develop  an  estimate  of  P(t,t)  for 
t=0,  1,...,  T-l. 

The  method  we  shall  use  to  estimate  P(t,t)  entails  some  knowledge 
of  a  and  p.   In  essence,  the  procedure  outlined  below  amounts  to  the 
following  strategy  in  the  use  of  this  model; 

1.  Obtain  estimates  of  a  and  p  from  equations  such  as  3.5-10 
and  3.5-11  by  one  of  the  methods  discussed  in  Chapter  IV. 

2.  Use  these  estimates  of  a  and  p  to  filter  the  expected 
change  from  the  raw  data.   The  transformed  data  will  then 
yield  estimates  of  P(t,t)  for  t=0,  1,...,T-1. 
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In  a  manner  analogous  to  the  proof  of  Lemma  2  we  could  show  that 


3.6-3         P(t,t+1)  =  EJX(t)X(t+lj] 


1 
=  S  m(t+lft)X(t)f[x(t)]dX(t) 

0 


Since  we  have  assumed  that  the  process  is  stationary  with  respect  to 
the  parameters  a  and  (5,  the  choice  of  the  time  origin  in  3.5-3  is  es- 
sentially arbitrary.   Only  the  elapsed  time,  t,  is  of  importance.   Hence 
for  m(t+l|t)  =  E[x(t+l)|X(t)]  in  3.6-3  we  would  have 

3.6-4      m(t+l|t)  =  X(t)e-(a+P)  f^  (l-e"^) 

since  t+l-t=l.   Substituting  3.6-4  into  3.6-3  we  have 

3.6-5    P(t,t+1)  =5{x(t)e"(a+f3)  +^jL  (1-e"  (a^  )}x(t)f  [x(t)]  dX(t) 

=  E[x2(t)]e-(a+P)  +^  E[x(t)](l-e-^)) 
=  P(t,t)e-(a+P)  +-2-P(t)(l-e-<<*e>) 

Now  P(t,t+1)  may  be  estimated  by  Q(t,t+1)  which  is  an  observable  pro- 
portion.  Since  we  have  already  assumed  that  we  have  estimates  of  a  and 
p,  and  since  Q(t)  is  observable,  we  may  solve  3.6-5  for  the  estimate 
of  P(t,t)  to  use  in  3.6-2.   The  solution  is 
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3.6-6     P(t,t)  =  ^  P(t)  +  e-(a^>{p(t,t+1)  -  JL  P(t)} 

Thus  we  now  may  obtain  empirical  estimates  for  both  3.6-1  and  3.6-2, 
the  first  two  raw  moments  of  the  response  probability  distribution. 
We  use  Q(t)  and  Q(t,t+1)  as  estimates  of  P(t)  and  P(t,t+1),  respectively, 
in  3.6-1  and  3.6-6. 

It  was  argued  in  Section  3.2  that  even  in  non-equilibrium  situa- 
tions the  beta  probability  law  might  be  a  reasonable  form  for  the  distr- 
bution  of  response  probability.   Of  course,  the  beta  distribution  of 

response  probability  will  be  exact  when  the  system  is  in  statistical 

18 
equilibrium.    Since  the  beta  is  a  two  parameter  distribution,  the 

two  raw  moments  given  above  will  prove  sufficient  to  identify  the  dis- 
tribution. 

If  we  use  the  form 

3.6-7      f>  ««,«,.>-  JgSL_  f«C.>}-l{i.«„}-"-> 

where  n>  r  >0 

we  then  have  from  Raiffa  and  Schlaifer  (1961)  that  the  first  two  raw 
moments  are 


3.6-8  u,  =  - 

Kl   n 


K2    n(n+l) 


-102- 

Then  to  obtain  a  moment  estimate  of  3.6-7  we  equate  3.6-8  to  3.6-1  and 
3.6-9  to  3.6-6  and  solve  for  r  and  n,  the  two  beta  parameters.   While 
moment  estimators  are  generally  somewhat  inefficient,  the  complexity 
of  the  development  of  just  the  moment  estimators  makes  it  doubtful  that 
more  efficient  procedures  will  prove  to  be  analytically  tractable  for 
this  model. 

3.7  Summary 

In  this  chapter  we  have  formulated  and  developed  a  model  which 
promises  to  be  a  reasonable  model  of  consumer  brand  choice.   We  first 
presented  Coleman's  results  on  the  steady  state  distribution  for  a 
finite  number  of  elements.   Coleman  termed  this  the  Contagious  Binomial 
distribution.  We  then  proved  that  as  the  number  of  elements  increases 
without  limit,  the  Contagious  Binomial  tends  to  the  beta  distribution. 
The  derivation  of  the  mean  and  variance  of  response  probability  demon- 
strated that  the  model  seems  to  be  a  viable  representation  in  that  the 
postulated  change  process  operating  on  an  individual's  response  proba- 
bility is  truly  stochastic.   The  diffusion  limit  and  its  relation  to 
the  steady-state  distribution  when  the  number  of  response  elements  in- 
creases indefinitely  was  next  developed.   Finally,  aggregation,  pre- 
liminary estimating  equations,  and  a  procedure  for  estimating  the  dis- 
tribution of  response  probability  across  a  population  of  respondents 
at  some  trial  were  presented. 
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Footnotes 


1.  This  section  closely  follows  Coleman's  development  in  Chapters  11, 
12,  and  13  of  Introduction  to  Mathematical  Sociology--Coleman  (1964b) 

2.  The  diagrammatic  presentation  is  intended  to  clarify  how  the  assumed 
process  at  the  level  of  the  elements  induces  the  N+l  state  process 
at  the  level  of  the  individual. 

3.  Coleman  (1964b,  p.  345). 

4.  Coleman  (1964b,  p.  345). 

5.  This  restriction  is  analogous  to  that  used  in  the  development  of 
the  Poisson  limit  of  the  Binomial  distribution.   See  Feller  (1957). 

6.  I  am  grateful  to  Prof.  James  McGregor  of  the  Mathematics  Department, 
Stanford  University,  for  pointing  this  result  out  to  me.  See 
Copson  (1935),  Chapter  9  for  further  results  on  gamma  functions „ 

7.  For  stationary  Bernoulli  trials  a  beta  distribution  of  the  Bernoulli 
parameter  is  the  natural  conjugate.  That  is,  if  the  parameter  has 

a  beta  prior  and  Bernoulli  trials  are  performed  to  obtain  sample 
evidence,  the  posterior  distribution  of  the  parameter  will  be  a  beta 
distribution.  See  Raiffa  and  Schlaifer  (1961)  Chpt.  9  for  further 
elaboration  of  this  point. 

8.  This  is  in  contrast  to  genetic  models  without  imitation  in  which 
both  of  the  two  extreme  gene  frequencies  are  absorbing  states. 

9.  Note  that  Eqn.  3.3-1  is  identical  to  Coleman's  Eqn.  1.2  on  page  17 
of  Models  of  Change  and  Response  Uncertainty  in  the  two  response 
case  if  one  interprets  Coleman's  v~      as  follows ; 

v   =  m(t)  =  E[x(t)] 

In  Models  of  Change  and  Response  Uncertainty,  Coleman  simply  asserts 
equation  3.3-1.   However,  in  Chapter  13  of  Introduction  to  Mathe- 
matical Sociology  he  derives  3.3-1  by  an  alternative  argument.   The 
procedure  used  here  makes  it  easier  to  investigate  the  variance  of 
the  process,  which  Coleman  didn't  do  in  either  of  the  works  cited 
above.   He  did,  however,  give  the  variance  of  the  steady-state  dis- 
tribution for  the  contagious  binomial  in  Chapter  4  of  Models  of 
Change  and  Response  Uncertainty. 
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10.  It  might  be  noted  that  the  expectation  technique  used  in  this  sec- 
tion can  also  be  used  to  find  the  latent  roots  of  the  transition 
matric  of  the  model  by  a  method  developed  by  Moran  (1958a).  Also 
see  Gani  (1961). 

11.  See  Feller  (1950). 

12.  Estimates  of  the  model  are  obtained  using  certain  observable  pro- 
portions of  responses.   If  the  model  is  to  be  applied  to  a  response 
sequence  from  a  single  individual,  two  conditions  must  be  fulfilled: 

(a)  there  must  be  three  or  more  groups  of  trials  (responses) 
between  which  the  individual's  response  probability  X(t) 
may  change, 

(b)  within  each  of  these  groups  no  change  may  take  place  in  his 
response  probability  X(t). 

Condition  (a)  is  necessary  because  three  groups  would  be  just  suf- 
ficient to  identify  the  model  and  retain  one  degree  of  freedom  for 
error.   See  Chapter  IV  for  details  concerning  the  estimation  pro- 
cedures.  Under  condition  (b)  the  response  sequence  within  each 
group  of  trials  must  be  sufficiently  long  to  yield  reasonably  stable 
empirical  estimates  of  the  expected  response  proportions.   To  ful- 
fill these  conditions  would  require  a  very  long  sequence  of  responses 
in  addition  to  the  assumption  that  the  change  process  has  remained 
stationary  over  this  very  long  sequence. 

13.  In  a  recent  paper  Massy  (1965)  presents  estimation  procedures  for 
linear  learning  models  in  which  the  initial  response  probability 
may  be  distributed  across  the  population  of  respondents. 

14.  See  Parzen  (1960,  p.  384). 

15.  See  Parzen  (1960,  p.  384)  and  Parzen  (1962,  p.  62). 

16.  If  we  were  to  formulate  the  mean  value  function  of  the  process  in 
the  backward  sense,  then  y   would  also  have  to  be  estimated. 

17.  The  end  effect  may  be  resolved  if  we  formulate  a  backward  change 
process  and  estimating  procedure.   This  topic  will  be  considered 
in  future  applications  of  this  model.   It  is  not  explored  in  this 
report. 

18.  See  Feller,  Introduction  to  Probability  Theory,  for  the  notion  of 
statistical  equilibrium, 


Chapter  IV 
ESTIMATION  PROCEDURES  FOR  THE  COHESIVE  ELEMENTS  MODEL 

If  the  Cohesive  Elements  Model  is  ever  to  be  tested—if  it  is  ever 
to  have  any  practical  relevance  to  the  marketing  scientist  to  say 
nothing  of  the  marketing  manager--then  estimation  procedures  and  methods 
for  testing  the  model  must  be  developed.   This  chapter  presents  some 
results  that  further  the  goals  of  estimation,  testing,  and  comparison 
of  the  Cohesive  Elements  Model  with  alternative  stochastic  models  of 
consumer  brand  choice. 

Two  alternative  estimation  procedures  will  be  discussed.   The  re- 
gression method  has  the  virtue  of  being  computationally  convenient.   How- 
ever, the  asymptotic  properties  of  the  estimators  and  the  measures  of 
fit  of  the  model  which  may  be  obtained  from  the  regression  procedure  are 
less  clear  than  for  the  minimum  chi  square  method.   The  minimum  chi 
square  estimation  procedure  yields  parameter  estimates  with  desirable 
asymptotic  properties.   In  addition,  it  also  provides  a  single  test 
of  the  fit  of  the  model.   Unfortunately,  there  does  not  appear  to  be 
any  hope  for  obtaining  an  analytic  solution  to  minimum  chi  square  esti- 
mates of  the  model's  parameters.   However,  given  the  present  state  of 
computer  technology,  numerical  minimization  is  a  viable  computational 
procedure. 

The  first  section  of  this  chapter  will  give  consideration  to  theo- 
retical constraints  on  the  values  of  the  model's  parameters.   The  next 
two  sections  will  present  the  regression  and  the  minimum  chi  square 
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estimation  procedures,  respectively.  Finally,  a  brief  discussion  of 
the  pros  and  cons  of  the  previously  developed  methods  for  estimating 
the  model  is  given. 

4. 1  General  Considerations 

In  the  previous  chapter  we  developed  the  following  two  equations 
which  we  shall  utilize  in  obtaining  estimates  of  P(0,0),  a>  and  p; 


4.1-1         P(t)  =  P(0)e-(a+P),:  +-SL  (l-e-^t) 

Q!+p 


4.1-2        P(0,t)  =  P(0,0)e"(a^)t  +  P(0)  A  (l-e-(a4*)t) 


First  note  that  if  we  let  t  -?<»  in  4.1-1  and  4.1-2,  that  is,  let  the 
model  approach  its  steady-state  value,  then  we  have 


4.1-3  lim  P(t)  =  -J- 


4.1-4  lim  P(0,t)  =  P(0)  -~ 

t^oo  "+P 


From  4.1-3  we  see  that  a/(a+P)  is  really  the  equilibrium  or  steady- 
state  choice  share  of  response  A..   It  is  also  clear  from  4.1-1  and 

4.1-2  that  a+P  represents  the  rate  at  which  the  model  approaches  its 

2 
steady-state. 

To  capture  these  behavioral  notions  in  our  estimating  procedure 

and  also  for  reasons  of  notational  convenience,  we  reparameterize  4.1-1 
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and  4.1-2  as  follows: 


4.1-6  k  =  o+p. 


We  now  have  in  place  of  4.1-1  and  4.1-2 


-kt        -kt 
4.1-7  P(t)  =  P(0)e    +  a(l-e   ) 


and 

4.1-8       P(0,t)  =  P(0,0)e~  "  +  a  P(0)(l-e~kt) 


Prior  to  developing  the  regression  and  the  minimum  chi  square  esti- 
mation procedures,  we  must  first  consider  the  inherent  constraints  on 
the  parameters  themselves.   Recall  from  the  axioms,  M1-M5,  in  Section 
2.2  that  a  and  p  represent  transition  intensities  in  opposite  directions, 
We  suffer  no  loss  of  generality  if  we  require  that; 

4.1-9  a^O      and    p>0 

Naturally  this  implies  that 

4.1-10  k  =  a+P-0 

It  is  interesting  to  note  the  implications  of  k=0.  In  this  case  no 
elements  change  their  response  association  through  time.  Hence,  in 
this  case  the  model  is  stationary  with  respect  to  the  response 
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probabilities.   Morrison  (1965b)  has  developed  methods  for  estimating 
the  parameters  and  testing  the  assumption  of  just  such  a  heterogeneous, 
stationary  Bernoulli  process  as  is  represented  by  the  Cohesive  Ele- 
ments Model  with  k=0.   In  simple  English,  k=0  means  that  an  individual 
respondent  (consumer)  does  not  change  his  response  probability  through 
time. 

The  equilibrium  choice  share  of  response  A^-i.e.,  "a"--is  also 
bounded.   Recalling  that  a  =  o/(a+P),  it  is  clear  that 

4.1-11  Ofa^l 


It  remains  now  to  establish  bounds  for  P(0,0).  Since 
P(0,0)  =  e\x   (0)]  and  Var[x(0)]>0,  it  is  clear  that 

4.1-12  P(0,0)>p2(0)>0 


It  is  interesting  to  note  that  if  P(0,0)  =  P  (0),  then  Var\k(0)^|  =  0 
and  all  individuals  in  the  population  have  the  same  initial  probability 
of  making  response  A. •   An  upper  bound  for  P(0,0)  given  P(0)  can  also 
be  formulated.   This  result  is  stated  below  as  Lemma  3. 
Lemma  3.   Given  a  particular  P(0)  =  e[x(0)J  where  0^P(0)-1,  the 
upper  bound  for  P(0,0)  =  e[x  (0)]   is 

P(0,0)^  P(0) 

Proof  of  Lemma  3   Suppose  for  the  purpose  of  argument  and  in  this  case 
without  loss  of  generality  that  X(0)  may  take  on  only  two  values, 
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X  (0)  and  X  (0).   Let  the  proportions  of  the  respondents  having  X  (0) 
and  X  (°)  be  m  and  m  .  respectively.   Clearly  m  =  1-m  .  Now  we  have 
that 


4.1-13  P(0)  =  E^X(O)]  =  Xin»1+X2  m2 

and 

0  9         0 

4.1-14        P(0,0)  =  e[x  (0)]  =  XL  m1  +  X2  n^ 

Hence,  after  sbme  algebra 

4.1-14       Var[x(0)]  =  e[x2(0)]  -  E  [x(0)] 


Xl  ml  +  X2  m2   "    (Ximi  +  X2m2)2 


(n^-m^    (XL-X2) 


Clearly  for  given  m.  and  P(0),  Var(x(0)]  and  hence  P(0,0)  are  maximized 
when  |x  -X  I  =  1.  But  since  0-X(0)-l  ,   the  only  case  for  which  the 
absolute  difference  between  the  two  values  of  X(0)  may  equal  one  is 
for  one  of  the  values  to  be  one  and  the  other  zero.   Letting  a  "/v." 
denote  the  maximum  variance  case  we  have  that 


4.1-15  Qt\*(0)\    =   P(0,0)-P2(0)  =  m^ 


But  now  since  X  =  1  ana  X  =  0  we  have  from  4.1-13  that 
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4.1-16  P(0)  =  m^ 


Hence  from  4.1-15  and  4.1-16,  we  have  that 


4.1-17       P(0,0)  =  n^-n^  +  p  (0)  =  m  =  P(0) 


Since  P(0,0)  is  the  maximum  value  of  P(0,0)  given  P(0),  4.1-17  estab- 
lishes the  upper  bound  stated  as  Lemma  3.  Note  that  if  P(0,0)  =  P(0), 
then  all  respondents  are  clustered  at  the  two  extreme  initial  response 
probabilities,  X(0)  =  0  and  X(0)  =  1.   The  proportion  of  the  respon- 
dents having  X(0)  =  1  is  just  P(0).  For  the  strict  inequalities 

P  (0)<P(0,0)<P(0) 

the  individual  respondents  are  distributed  across  the  response  proba- 
bility continuum. 

Before  turning  to  the  development  of  the  estimation  procedures, 
it  is  well  to  consider  the  properties  of  the  sample  estimates  Q(t)  and 
Q(0,t)  of  the  population  moments  P(t)  and  P(0,t),  respectively.  Recall 
that  in  Section  3.5  Lemma  1  and  Lemma  2  demonstrated  that 

E[Q(t)]  =  P(t)  =  E[x(t)J 

and 

E  [Q(0,t)^[  =  P(0,t)  =  E(k(0)X(t)]  =  E[x(0,t)] 
where 
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Q(t)  is  the  sample  proportion  (moment)  making  response  A.  at  time 

E^X(t)J  is  the  average  probability  of  making  response  A.  in  the 

population  at  time  t. 
Q(0,t)  is  the  sample  proportion  making  response  A1  at  time  zero 

and  again  at  time  t. 
E[x(0,t)J  =  E\X(0)X(t)J  is  the  average  probability  in  the  popula- 
tion of  making  response  A,  at  time  0  and  again  at  time  t. 
P(t)  =  E[x(t)] 

P(0,t)  =  E[x(0,t)]  =  E[x(0)X(t)] 
Thus  the  lemmas  proved  that  Q(t)  and  Q(0,t)  are  unbiased  estimates  of 
P(t)  and  P(0,t),  respectively. 

It  has  been  shown  (Cramer,  1946,  p.  207,  and  Kendall  and  Stuart, 
1958,  pp.  126-27)  that  the  variance  of  the  sample  proportion,  Q(t),  in 
a  heterogeneous  Bernoulli  population  is  just 

P(t)(l-P(t)j-  Var[x(t)] 


4.1-18        Var[p(t)] 


M 
.2 


E[x(t)]-  E  [x(t)J-  Var[x(t)J 


M 


E[x(t)]-  E[x2(t)] 

M 


where  M  is  the  number  of  respondents  entering  into  the  proportions. 
Since  E\X(t)J  and  e[x  (t)J  are  independent  of  M,  we  see  that 

lim  Var[x(t)J  =  0. 
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Equation  4.1-18  will  prove  useful  below  when  we  consider  the  asymptotic 
normality  of  Q(t) . 

The  sample  proportions  Q(t)  and  Q(0,t)  have  also  been  shown  to  con- 

3 
verge  in  probability  to  the  population  values.   That  is, 

lim  P(|Q(t)-P(t)|i  e)  =  0 
M  ->°o 

and  lim  P(  )Q(0,  t)-P(0,  t)|  #  e)  =  0 

where  e  is  an  arbitrarily  small  real  number. 

Finally,  it  can  be  shown  under  certain  rather  general  conditions 

that  Q(t)  is  asymptotically  normal  with  mean  P(t)  and  variance 

P(t)jl-P(t))-  Var  X(t) 
M 

For  a  heterogeneous  population  the  Central  Limit  Theorem  holds  when 

4 
Liapounoff's  conditions  are  satisfied.   Liapounoff's  result  may  be 

stated  as  follows: 

Let  r.j  r  j . . . ,r  , , . ,  be  independent  random  varibales  and  denote 

the  mean  and  standard  deviation  of  r.  by  ^.  and  a   ,  respectively. 

Assume  that  the  third  absolute  moment  of  r  about  its  mean  is 

j 

finite  for  all  j.   That  is, 

3  r.  .  3- 


pj  =Er|rJ-^  i<°° 


3  3  3  3 

Let  p     =pi  +p2  +  ...    +pN 


P 

If  lim    - 
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M 
then       R  =  ^T  r .    is  asymptotically  normal 


■"-&';   i! 


M 
with  mean   |i  =  ^  u.        anc* 
j=l   J 


2    il   2 
variance  a     -    2-    0. 


Cramer  (1946,  pp.  217-8)  has  shown  that  a  sufficient  condition  for  the 
sample  proportion  of  a  heterogeneous  Bernoulli  population  to  be  asymp- 
totically normal  is  that  no  member  of  the  population  has  a  probability 
of  exactly  0  or  exactly  1  of  making  response  A.  •   This  condition,  while 
only  sufficient  and  not  necessary  for  asymptotic  normality  of  Q(t), 
would  not  be  restrictive  in  applications  of  the  Cohesive  Elements  Model 
to  human  populations,  particularly  consumer  populations.   Intuitively 
it  seems  very  unlikely  that  any  brand  is  so  overwhelmingly  superior 
that  it  will  be  chosen  with  certainty  or  so  inferior  that  it  would  be 
chosen  under  no  circumstances  on  a  particular  purchase  occasion  by  a 
consumer.   It  seems  very  reasonable  to  assume  that  the  probability  of 
making  response  A,  never  gets  to  precisely  0  or  precisely  1.   Note 
that  this  view  is  consistent  with  Kuehn's  argument  for  incomplete 
learning  in  the  consumer  brand  choice  situation. 
In  summary,  the  sample  proportion  Q(t): 

1.  is  an  unbiased  estimate  of  P(t), 

2.  converges  in  probability  to  P(t),  and 

3.  is  asymptotically  normal  with  mean  P(t)  =  EQt(t)j  and 
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variance under  certain  seemingly   reasonable 

M 


E_[x(t)]-E[x  (t)] 

conditions. 
Similar  results  hold  for  Q(0,t). 

4.2  Regression  Estimation  Procedure 

In  this  section  we  shall  develop  a  regression  or  least  squares 
procedure  for  estimating  the  parameters  of  the  Cohseive  Elements  Model. 
The  fundamental  regression  relations  will  first  be  derived.  Then  con- 
sideration will  be  given  to  sources  of  error  in  the  regression  relations, 
Attention  will  next  focus  upon  the  regression  procedure,  its  properties, 
and  its  pitfalls.   Contrast  of  the  regression  procedure  to  the  minimum 
chi  square  method  developed  in  Section  4.3  is  deferred  until  Section  4.4, 

4.2.1  The  Fundamental  Regression  Relations 

The  fundamental  regression  relations  are  derived  from  4.1-7  and 
4.1-8.   Equation  4.1-7  may  be  rearranged  to  yield 


4.2-1  P(t)-P(0)e"kt  =  a[l-e"ktJ 


Substituting  4.2-1  into  4.1-8  then  yields 


4.2-2        P(0,t)  =  P(0,0)e"  '  +  P(0)P(t)-P  (0)e'kt, 


which  may  be  written  as 

4.2-2a        P(0,t)-P(0)P(t)  =  [P(0,0)-P2(0)}e"kt 

Taking  natural  logarithms  of  4.2-2a  we  find 
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4.2-3      i«{p(0,t)-P(0)P(t)]  =  j2rt[p(0,0)-p2(0)j-kt, 

which  is  the  first  of  our  basic  regression  relations.  Rearranging 
4.1-7  in  yet  another  way  leads  to 


4.2-4        P(t)-a  =  e 


'kt0<0>-a] 


Similarly,  rearranging  4.1-8  we  have 

4.2-5        P(0,t)-a  P(0)  =  e"kt[p(0,0)-aP(0)] 

It  is  easy  to  see  from  4.2-4  and  4.2-5  that  we  now  have 


426         P(t)-a  _  P(0,t)-aP(0) 
P(0)-a  ""  P(0,0)-aP(0) 


Equation  4.2-6  may  be  written  as 

4.2-7  [p(t)-a][p(0,0)-a  P(0)J   =   [p(0)-a]  jj>(0,t)-a  P(0)j 

With  a  bit  of  algebra  we  find  from  4*2-7  that 

4.2-8       P(0)P(0,t)  =  P(t)P(0,0)-a  P(0,0)+a[p(0,t)+p2(0)-P(0)P(t)] 

Equations  4.2-3  and  4.2-8  form  the  basis  for  the  regression  esti- 

5 
mation  procedure.   For  convenience  of  discussion  these  equations  are 

restated  below 

4.2-3       L   (P(0,t)-P(0)p(t)j  =  J*   (p(0,0)-p2(0)]-kt 
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4.2-8      P(0)P(0,t)  =  -a  P(0,0)+P(0,0)P(t)+a{p(0,t)-P(0)P(t)+p2(0)] 

For  t=l,2,...,T.  The  parameters  of  these  two  equations  are  a,   k,  and 
P(0,0).   The  variables  are  t,  which  indexes  time  or  response  occasions, 
and  the  P(t)  and  P(0,t),  which  are  expected  response  proportions  that 
may  be  directly  estimated  by  sample  proportions. 

In  the  discussion  to  follow  it  will  often  be  helpful  to  let 
B=P(0,0)  and  Y  =P(0,t)-P(0)P(t)  in  4.2-3  and  4.2-8.   These  equations 
then  become 

4.2-9  U  Y  =  ^B-P2(0)]-kt 

4.2-10         P(0)P(0,t)  =  -a  B4BP(t)+aJYt+P2(0)] 

It  is  important  to  recall  that  in  4.2-3,  4.2-8,  4.2-9,  and  4.2-10 
P(t)  and  P(0,t)  represent  expected  response  proportions.  That  is,  the 
Cohesive  Elements  Model  implies  these  exact  linear  relations  on  func- 
tions of  the  expected  response  proportions. 

4.2.2  Sources  of  Error 

Thus  far  the  fundamental  regression  equations  contain  no  error. 
The  equations  express  exact  linear  relationships  which  would  hold  if 
the  model  were  a  completely  accurate  description  of  the  process  and  the 
variables  were  measured  without  error.   It  would  be  most  surprising  if 
the  Cohesive  Elements  Model  were  actually  a  complete  and  accurate 
description  of  such  a  complex  process  as  the  dynamics  of  response  in 
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the  binary  choice  case.   Rather,  it  seems  more  reasonable  to  consider 
the  model  as  a  first  order  approximation  to  the  true  process. 

In  recognition  of  the  fact  that  the  Cohesive  Elements  Model  is 
not  an  exact  description,  but  rather  an  approximate  model,  the  funda- 
mental relations  may  be  revised  in  order  to  reflect  this  fact.  We  now 
have 

4.2-11  JU   Yt  =&»{B-P  (0)Vkt  +uu 

4.2-12  P(0)P(0,t)  =  -aB-ffiP(t)-w[Yt+P  (0)]+U2t 


where  U   and  U   represent  the  true  errors  at  time  t  in  Equations 
4.2-11  and  4.2-12,  respectively.  The  errors  which  we  have  been  con- 
sidering thus  far  are  termed  errors  in  equations  or  specification 

errors  in  the  econometric  literature.   It  should  be  noted  that  we 

6 

have  assumed  additive  specification  errors  in  4.2-11  and  4.2-12. 

Errors  of  this  type  are  easily  handled  by  least  squares  procedures 
in  most  cases.   The  formulation  given  in  4,2-11  and  4.2-12  treats 
the  model  relations  on  the  variables  as  linear  approximations  to  the 
true  relations. 

A  far  more  serious  source  of  error  arises  because  we  must  use 
observed  response  proportions  as  estimates  of  the  expected  response 

proportions.  That  is,  we  are  faced  with  an  error  in  variables  sit- 

8 
uation.   Standard  least  squares  will  yield  biased  estimates  of  the 

regression  coefficients  in  the  presence  of  errors  in  the  independent 

variables.   Errors  in  the  dependent  variable  are  indistinguishable 
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from  specification  errors  from  the  standpoint  of  the  estimation  pro- 

9 
cedure:. .   Hence  4.2-11  has  no  errors  in  variables  problem  since  the 

variables  error  is  entirely  in  the  dependent  variable,  Y  ,  while  the 

independent  variable, t,  is  non-stochastic.   Note  however  that  in  the 

2        2 
intercept  term  of  4.2-11  the  estimate  Q  (0)  of  P  (0)  is  subject  to 

error.  Since  this  error  occurs  in  the  intercept  term  of  4.2-11,  we 
do  not  face  an  error  in  variables  situation  when  we  apply  least  squares 
to  4.2-11.   However,  when  we  use  the  estimate  of  the  intercept  and  the 
estimate  of  P(0)  to  estimate  B,  we  will  have  the  error  in  the  estimate 
of  B  perfectly  negatively  correlated  with  the  error  in  P(0). 

For  4.2-12,  however,  the  errors  in  variables  problem  may  be  serious. 
Fortunately,  as  the  number  of  respondents  included  in  the  analysis  in- 
creases, the  variances  of  the  observed  proportions  Q(t)  and  Q(0,t) 
about  P(t)  and  P(0,t)  decrease.  Thus  the  errors  in  variables  problem 
encountered  in  4.2-12  may  be  overcome  by  increasing  the  number  of  indi- 
vidual respondents.  Similarly,  the  error  in  the  estimate  of  B  due 
to  the  error  in  the  estimate  of  P(0)  will  decrease  as  the  number  of 
respondents  increases.  We  will  then  be  left  with  only  specification 
error  which  is  considerably  easier  to  deal  with. 

4.2.3  Estimating  the  Simultaneous  Equations 

It  is  clear  from  4.2-9  and  4.2-10  that  Yfc  is  an  endogenous  variable 
in  this  two  equation  system.   That  is,  Y  is  the  dependent  variable  in 

4.2-9  and  an  independent  variable  in  4.2-10.   Consequently  we  must 

10 
treat  these  equations  as  a  simultaneous  system. 
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Siraultaneous  equations  often  lead  to  very  formidable  estimation 
problems.   In  the  present  case  the  problem  is  greatly  simplified  by 
the  fact  that  4.2-11  and  4.2-12  form  a  recursive  system  of  equations. 
A  two  equation  recursive  system  has  the  following  general  form 

-yL(t)      +b11x1(t)+...-rt»klxk(t)4u1(t)  =0 

a12yl(t)  "  y2(t)  +b12X1(t)+...-rt>k2Xk(t)-HJ2(t)  =  0 


wh 


ere  t  indexes  observations,  y  and  y  are  endogeneous  variables,  and 
X.  — i=l, . . . ,k  —   are  predetermined  variables.   If  we  examine  the 
matrix  of  coefficients  of  the  endogenous  variables  y  and  y  ,  we  see 
that  we  have  the  following  triangular  matrix 


-1     0 

a12   "X_ 

The  triangularity  of  the  matrix  of  coefficients  of  the  endogenous  var- 
iables distinguishes  a  recursive  system.   It  further  indicates  that  y1 

should  be  the  dependent  variable  in  the  first  equation  while  y«  should 

12 
be  the  dependent  variable  in  the  second.    It  is  easily  seen  that  4.2-11 

and  4.2-12  form  a  recursive  system.   The  implications  of  4.2-11  being 

log-linear  are  discussed  below.   For  a  more  systematic  development  of 

recursive  systems  and  their  properties  see  Klein  (1953,  Chpt.  3). 

For  a  recursive  system,  if  the  errors  between  the  equations  of 

the  system  are  independent  (i.e.,  if  U,   and  U  ,    are  independent  for 

all  t  and  t1),  then  single  equation  least  squares  applied  to  each 

13 
equation  in  turn  will  be  identical  to  maximum  likelihood  estimates. 
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If  the  errors  are  not  independent,  then  single  equation  least  squares 
applied  to  each  equation  ^n  turn  is  no  longer  identical  to  maximum 
likelihood  procedures.   The  equations  may  still  be  estimated  sequen- 
tially, however.   In  this  case  the  estimates  will  maintain  the 

property  of  consistency  even  though  they  may  no  longer  be  fully  ef- 

14 
ficient.    Thus  as  the  correlation  between  the  errors  of  the  two 

equations  goes  to  zero,  the  estimates  obtained  by  the  recursive  single 

equation  procedure  will  tend  toward  maximum  likelihood  estimates. 

At  this  point  it  is  necessary  to  clarify  what  is  meant  by  applying 

single  equation  procedures  in  turn  to  each  equation.   In  the  present 

case  this  means  that  we  first  estimate  4.2-11  by  standard  least  squares. 

We  then  use  the  values  of  Y  estimated  by  this  equation  as  our  endogenous 

variable  in  4.2-12.   The  recursive  nature  of  the  system  enables  us  to 

first  estimate  the  endogenous  variable  as  the  dependent  variable  of  one 

equation  and  then  to  use  the  estimated  values  of  this  variable  as  an 

15 
independent  variable  in  the  second  equation. 

Before  presenting  a  summary  statement  of  the  procedure  for  esti- 
mating the  two  recursive  equations,  it  is  useful  to  express  4.2-12  in 
an  alternative  form.   That  is, 

4.2-13        P(0)p(0,t)-BP(t)  =  a[Yt-B+p2(0)J-fU2t 

Clearly  4.2-13  is  a  regression  constrained  to  pass  through  the  origin, 
given  that  we  know  B  as  well  as  P(0),  P(0,t),  P(t),  and  Y  •   But  re- 
call that  the  recursive  procedure  discussed  in  the  preceding  paragraph 
requires  that  we  first  estimate  4.2-11.   The  estimated  intercept  of 
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thts   equation   is   just 

^[P(0,0)-P2(0)]  =  J*[b-p2(0)] 

Thus  we  shall  have  an  estimate  of  B=P(0,0)  to  use  in  4.2-13  as  a  re- 
sult of  estimating  4.2-11. 

The  question  remains  as  to  what  statistical  properties  this  esti- 
mate of  P(0,0)=B  will  possess.   Recall  that  if  the  errors  in  the  two 
equations  of  the  system  are  uncorrelated  and  if  we  make  the  standard 
regression  assumptions  with  normally  distributed  errors,  then  the  esti- 
mate of  the  intercept  in  4.2-11  will  be  the  maximum  likelihood  esti- 

16  c 
mate   of 


k   [b-p2(0)J=  &[p(0,0)-P2(0)} 


The  operation  of  taking  a  logarithm  is  a  strictly  monotonic  or  one  to 
one  transformation  of  its  argument.   Since  maximum  likelihood  esti- 
mates are  invariant  under  one  to  one  transformations ,  we  will  have  a 
maximum  likelihood  estimate  of  P(0,0)  in  this  case.    If  the  errors 
between  the  equations  are  not  independent,  then  the  estimate  of 

j>vi  |_P(0,0)-P  (0)J  will  be  consistent  but  not  maximum  likelihood.   But 

18 
consistent  estimates  are  invariant  under  continuous  transformations 

and  the  operation  of  taking  a  logarithm  is  also  a  continuous  transfor- 
mation.  Hence  in  this  case  we  will  have  a  consistent  estimate  of  P(0,0) 

to  use  in  4.2-13.   Thus  at  worst  we  will  be  substituting  a  consistent 

19 
estimate  into  4.2-13,   while  at  best  this  estimate  will  be  maximum 

likelihood. 
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The  advantage  of  this  recursive  procedure  is  further  underscored 
when  one  considers  the  problems  inherent  in  4.2-12  due  to  the  fact 
that  the  intercept  term  of  this  equation  is  -aB  while  the  regression 
coefficients  of  the  independent  variables  are  a  and  B.   Fortunately 
the  estimation  of  4.2-11  prior  to  4.2-12  yields  an  estimate  of  B  having 
desirable  statistical  properties.  Using  this  estimate  of  B,  we  are 
then  able  to  use  4.2-13  rather  than  4.2-12  to  estimate  the  parameter 
a.   If  this  recursive  procedure  were  not  available,  the  problem  of 
estimating  the  equations  would  be  hopelessly  complex.   This  becomes 
clear  when  we  note  that  the  intercept  of  4.2-11  contains  Jra(B-P  (0)) 
while  4.2-12  is  subject  to  the  non-linear  constraint  that  its  intercept  = 
-aB.   The  B  must  be  the  same  in  both  equations.  While  it  is  possible 
to  speculate  about  iterative  procedures  to  satisfy  the  relations  be- 
tween and  within  these  equations,  the  statistical  properties  of  such 
estimates  would  be  most  difficult,  if  not  impossible,  to  obtain.   All 
this  has  been  by  way  of  underscoring  our  good  fortune  that  the  simul- 
taneous regression  relations  for  the  Cohesive  Elements  Model  form  a 
recursive  system. 

At  this  point  it  seems  appropriate  to  summarize  the  three  step 

regression  procedure  to  follow  in  estimating  the  two  recursive  simul- 

20 
taneous  equations  of  the  Cohesive  Elements  Model. 

Recursive  Regression  Procedure 

Step  1.   Estimate  4.2-11  by  classical  least  squares  to  obtain 

estimates  of  k  and  Jkn[p(0,0)-P   (0)J  .   Be  certain  that 
the  number  of  respondents  included  in  the  analysis  is 
sufficiently  large  so  that  the  errors  in  variables  problem 
may  be  safely  neglected. 
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Step  2.   From  the  estimate  of  the  intercept,  &n[p(0,0)-p  (0)Jin 
4.2-11  and  from  Q(0),  the  observed  proportion  which  is 
the  sample  estimate  of  P(0),  solve  for  B=P(0,0).   Denote 
this  estimate  by  B. 

Step  3.   From  the  estimated  relation  in  Step  1  compute  the  values 
of  Y  which  would  be  estimated  at  each  value  of  t  given 
the  estimated  parameters.  Denote  this  estimate  by  Y  . 
Then  using  Y  and  1i  estimate  4.2-13  in  the  following 
manner 


=  a[Yt-W(0)]+e2t 


Q(0)Q(0,t)-B  Q(t)  =  a|Yt-B4Q  (0) 

where  now  e   is  the  sample  estimate  of  the  error  measured 

2t 
from  the  "true"  relation  and  Q(t)  and  Q(0,t)  are  sample 

estimates  of  P(t)  and  P(0,t),  respectively. 


4.2.4  Problems  in  the  Regression  Procedure 

Recall  that  in  Section  4-1  we  noted  certain  inherent  theoretical 
constraints  on  the  parameter  values.  See  Equations  4.1-10,  4.1-11  and 
4.1-12  and,  in  addition,  Lemma  3.  The  estimation  procedure  recommended 
above  in  Section  4.2.3  does  not  incorporate  these  constraints.  Should 
any  of  these  constraints  be  significantly  violated  by  the  unconstrained 
estimates,  the  model  itself  would  be  called  into  question.   For  example, 
data  could  yield  an  estimate  of  a>l,  which  is  theoretically  impossible 
under  the  assumption  of  the  Cohesive  Elements  Model.   See  4.1-11.   Such 
a  value  implies  an  equilibrium  or  steady-state  choice  share  for  response 
A  versus  response  A  of  ever  one  hundred  percent.   By  a  significant 
violation  of  a  theoretical  constraint  we  shall  mean  an  over  or  under- 
estimate of  a  parameter  the  order  of  magnitude  of  which  would  be  very 
unlikely  to  result  from  sampling  fluctuations  about  a  true  parameter 
value  which,  in  fact,  satisfies  the  constraint. 

We  could,  of  course,  seek  to  resolve  this  problem  by  attempting 
to  solve  the  quadratic  programming  problem  which  will  result  if  we 
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should  seek  to  minimize  the  squared  error  subject  to  these  theoretical 

constraints.   Goldberger  (1964,  pp.  261-62)  refers  to  some  results  by 

21 
Zellner  relating  to  this  problem.    Unfortunately,  Goldberger  does 

not  present  Zellner' s  results  on  the  sampling  distribution  of  such 

22 
estimates.    In  any  case,  Goldberger  does  give  Zellner' s  recommenda- 
tion to  set  the  estimated  parameter  equal  to  the  constraining  value 
it  has  violated  whenever  unconstrained  least  squares  yields  an  in- 
feasible  estimate.  Anticipating  the  results  of  Section  5.6  somewhat, 
we  note  that  an  estimate  of  a=1.25>  1  was  obtained  for  one  of  the 
groups  used  in  the  empirical  application  of  the  model.   Following 
Zellner  we  set  the  estimate  of  a  at  1.0  in  this  case. 

Another  problem  in  the  regression  procedure  is  the  measure  of 
"goodness  of  fit"  of  the  Cohesive  Elements  Model  to  the  data.  At 
first  glance,  it  seems  natural  to  use  the  correlation  coefficients 
(or  rather  the  square  of  the  correlation  coefficients)  as  the  measure 
of  the  fit  of  the  model  to  the  data  in  each  equation.   It  turns  out 
that  there  are  pitfalls  in  this  approach. 

For  the  "k"  estimating  equation  (4.2-11)  the  correlation  coef- 
ficient measure  is  rather  ambiguous.   In  this  case  it  measures  whether 
k  is  significantly  different  from  zero,  but  this  is  not  identical  to  a 
test  of  the  model.  An  example  will  illustrate  this  point.  Suppose 
that  the  Cohesive  Elements  Model  really  is  the  model  which  generates 
a  particular  set  of  observations.   Further  suppose  that  in  this  case 
we  are  dealing  with  a  group  of  individual  respondents  each  of  whom  ex- 
periences no  change  in  his  response  probability  between  successive 
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responses.   That  is,  we  now  have  a  group  of  respondents  who  are  stationary 
with  respect  to  their  response  probabilities.   This  coincides  with 
Morrison's  (1965b)  quasi-stationary  Bernoulli  universe.   In  terms  of  the 
Cohesive  Elements  Model  this  is  the  case  where  k=a+B=Q.   But  note 
that  if  the  true  value  of  k  is  zero,  the  regression  given  as  4.2-11 
should  yield  an  estimated  correlation  coefficient  very  close  to  zero. 
If  this  correlation  coefficient  were  to  be  taken  as  a  measure  of  the 
fit  of  the  model,  the  conclusion  in  this  case  would  be  the  erroneous 
one  that  the  Cohesive  Elements  Model  does  not  fit  the  data.  The  cor- 
rect conclusion  would  be  that  the  model  is  valid  with  k=0.   Hence  the 
correlation  coefficient  in  4.2-11  or  a  test  of  the  significance  of  the 
estimated  slope,  k,  do  not  necessarily  measure  the  veracity  of  the 
Cohesive  Elements  Model  in  an  empirical  situation. 

In  the  case  of  the  "a"  estimating  equation  (4.2-13)  we  have  a 

23 

regression  through  the  origin.    "Goodness  of  fit"  is  not  measured 

by  the  correlation  coefficient  in  this  case.   For  example,  suppose 
we  had  data  which  gave  us  the  estimated  regression  relation  shown  in 
Figure  4.2-1.   From  the  figure  it  is  clear  that  the  estimated  inter- 
cept, a,  is  significantly  greater  than  zero.  We  also  see  from  the 
figure  that  the  correlation  between  X  and  Y  is  r=0.90,  a  very  high 
correlation  indeed.   But  if  our  theory  suggests  a  linear  celation 
between  X  and  Y  which  passes  through  the  origin,  we  see  immediately 
that  an  estimated  relation  such  as  that  given  in  Figure  4-1  would  be 
inconsistent  with  our  theory  in  spite  of  the  high  correlation  between 
X  and  Y.   Thus  the  correlation  coefficient  does  not  provide  an 
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adequate  measure  of  "goodness  of  fit"  when  the  regression  implied  by 
our  theory  must  pass  through  the  origin.  We  need  another  measure. 

FIGURE  4-1 
HYPOTHETICAL  REGRESSION  RELATION 


t>Yi 


=  a-fb  X 


r  =  0.90 


X 


The  above  discussion  and  Figure  4-1  suggest  a  procedure  which 
tests  the  estimated  intercept  to  see  whether  or  not  it  is  significantly 
greater  than  zero.   Should  the  estimated  intercept  be  significantly 
greater  than  zero,  the  data  will  be  inconsistent  with  the  model  what- 
ever may  be  the  value  of  the  correlation  coefficient.   In  summary,  then, 
we  could  use  the  following  procedure  for  testing  the  fit  of  the  model 
to  any  given  set  of  data: 

Estimate  4.2-13  without  the  constraint  that  the  intercept  be 
zero.   Test  the  estimated  intercept  to  see  if  it  differs  signifi- 
cantly from  zero.   If  it  does,  we  can  reject  the  model. 

If  the  estimated  intercept  does  not  differ  significantly  from  zero,  we 

might  be  tempted  to  use  r--the  correlation  coeff icient--as  a  further 
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measure  of  the  fit  of  the  model  to  the  data.   However,  since  "a"  equal 
to  zero  is  a  legitimate  parameter  value  and  since  a  test  of  signifi- 
cance for  r>0  is  equivalent  to  a  test  on  a>0,  this  procedure  is 
fraught  with  the  same  ambiguities  which  plagued  the  use  of  r  in  the 
"k"  estimation  equation  as  a  measure  of  "goodness  of  fit." 

In  summary,  then,  we  see  that  the  recursive  regression  estimation 
procedure  does  not  yield  any  clear-cut  measure  of  the  "goodness  of  fit" 
of  the  model  to  a  set  of  data.   Fortunately,  the  minimum  chi  square 
procedure  which  is  developed  in  the  next  section  does  yield  a  single, 
well  defined  measure  of  the  overall  fit  of  the  model  to  a  set  of  data. 

Furthermore,  the  regression  procedure  places  rather  stringent 
demands  upon  a  data  base  in  order  for  the  estimator  to  have  desirable 
statistical  properties.   In  the  first  place,  a  large  number  of  indi- 
vidual respondents  must  be  available  in  order  to  diminish  the  errors 
in  variables  problem  to  negligible  proportions.   Secondly,  the  asymp- 
totic property  of  consistency  holds  as  the  number  of  responses  per 
respondent  increases.   This  requirement  is  troublesome  in  terms  of 
both  data  availability  and  possible  non-stationarity  in  the  change  pro- 
cess itself.   Thus  the  regression  procedure  requires  both  a  large  sample 
of  respondents  and  a  large  number  of  responses  per  respondent  in  order 
for  the  asymptotic  properties  of  the  estimator  to  hold.   In  any  empiri- 
cal situation  where  we  have  available  only  a  few  responses  per  respon- 
dent the  "nice"  asymptotic  properties  of  the  regression  estimates  are 
little  comfort. 

Finally,  estimation  of  4.2-11  will  degenerate  if  P(0,t)-P(0)P(t)=Y  £0 
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since  the  natural  logarithm  of  such  an  argument  is  not  defined.  A  nega- 
tive value  for  Y  Taay   result  for  one  of  two  reasons; 

1.  sampling  error  in  the  estimator  of  P(0),  P(t),  and  P(0,t), 

2.  the  model  doesn't  describe  the  data  very  well. 

As  the  number  of  respondents  increases,  the  sampling  error  in  the  esti- 
mator of  P(0),  P(t),  and  P(0,t)  decreases.  Thus  the  likelihood  of 
degeneracy  resulting  from  sampling  error  diminishes  as  the  sample  size 
increases.   Preliminary  empirical  experience  with  MRCA  dentifrice  data 
suggests  that  degeneracy  is  likely  to  occur  when  the  number  of  respon- 
dents is  less  than  fifty.  But  with  a  sample  this  small  the  errors  in 
variables  problem  is  itself  sufficiently  serious  to  invalidate  the  use 
of  the  recursive  regression  procedure.   The  second  source  of  degeneracy 
may  result  if  extreme  changes  in  response  probability  are  common  or  if 
P(0,t)  is  very  small. 

4.3  Minimum  Chi  Square  Estimation 

Our  objective  in  this  section  is  to  formulate  a  minimum  chi  square 

24 
procedure  for  estimating  and  testing  the  Cohesive  Elements  Model.    The 

minimum  chi  square  procedure  provides  a  test  of  the  model  in  the  "good- 
ness of  fit"  sense  as  well  as  parameter  estimates  which  have  desirable 
asymptotic  properties  as  the  number  of  respondents  increases. 

The  subsequent  discussion  will  first  review  the  minimum  chi  square 
procedure  and  the  properties  of  the  estimates  which  may  be  obtained 
from  this  method.   Then  attention  will  be  turned  to  the  formulation  of 
a  minimum  chi  square  estimation  procedure  for  the  Cohesive  Elements 
Model.   The  final  topic  will  relate  to  the  numeric  minimization  of  the 
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chi  square  statistic. 

4.3.1  The  Minimum  Chi  Square;   Method  and  Properties 

Suppose  we  have  a  situation  in  which  each  of  M  independent  indi- 
viduals makes  one  of  K  mutually  exclusive  and  collectively  exhaustive 
responses.  Suppose  further  that  each  individual  has  some  probability 
of  making  each  of  the  K  response  alternatives  —  that  is,  the  actual 
response  of  an  individual  will  be  an  outcome  from  a  probability  process, 
Further  suppose  that  we  have  a  model  of  this  response  situation  which 
yields  predictions  as  to  the  expected  proportion  of  these  respondents 
who  will  make  each  of  the  alternative  responses.   In  general,  the  model 
will  have  certain  parameters  which  determine  these  expected  response 
proportions.  A  measure  of  the  fit  of  the  model  to  any  appropriate  set 
of  observations  is  given  by  the  well  known  "goodness  of  fit"  statistic 

,2 


X2  .  H  i    CV'i^' 


i=i      *i<U 


where  i  indexes  the  set  of  mutually  exclusive  and  collectively  ex- 
haustive response  alternatives,  v  is  the  observed  proportion  of  the 
M  respondents  making  response  A,  >  and  r .(.&)    is  the  expected  proportion 
of  the  M  respondents  making  response  A,  as  a  function  of  I,    the  vector 
of  parameters  of  the  model.   We  want  to  find  %   such  that 

2  I  L\-\(D]2 

4.3-2  min  X  =  min  M  X 


ui   ri<i} 
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That  is,  our  objective  is  to  find  the  vector  of  parameters,  £,  that 

provides  the  best  agreement  between  the  model  and  the  data  as  measured 

2 
by  X  . 

25  2 

It  has  been  shown   that  minimizing  X  given  by  4.3-1  with  respect 

to  £  will  yield  estimates  of  the  parameter  vector  which  have  desirable 

statistical  properties.  Suppose  for  the  moment  that  we  have  a  model 

2 
containing  L  parameters.  Then  minimization  of  X  with  respect  to  each 

element  of  the  parameter  vector  will  yield  the  following  system  of  L 

equations  which  would  have  to  be  solved  for  the  elements  of  the  parameter 

vector  if  an  analytic  solution  is  to  be  obtained; 


4-3'3        _1_^X2    * 

'2M^Ti=i 


Vri<£>    (VV^ 


2  -i 


rtm 


where  j  =  1,  2,    ...,   L 


An  alternative  system  of  equations  which  under  general  conditions  has 
the  same  limiting  distribution  as  4.3-3  as  the  number  of  respondents, 
M,  increases  is  known  as  the  modified  minimum  chi  square  procedure. 
The  minimizing  system  of  derivatives  in  this  case  is  just 


4.3-4 

where  J  =  1,  2,  ...,  L 


It  should  be  noted  that  4.3-4  is  identical  to  the  maximum  likelihood 

26 
system  of  equations.    Using  either  4.3-3  or  4.3-4  to  obtain  estimates 
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2 
of  I,   we  will  have  X  distributed  asymptotically  as  a  chi  square  random 

27 
variable  having  M-L-l  degrees  of  freedom.    For  the  minimum  chi  square 

procedure  the  asymptotic  properties  hold  as  the  number  of  individual 

respondents  increases--!. e. ,  as  M->OP  . 

The  parameter  estimates  which  are  obtained  from  4.3-3  or  4.3-4  are 

28 
best  asymptotically  normal  (BAN)  estimates  of  the  model's  parameters. 

BAN  estimates  possess  the  following  properties  where  £  is  the  estimate 

0 
of  the  true  parameter  |.: 

i 

0 
(i)   | .  is  a  consistent  estimate  of  g   .  That  is, 


lim     Pf|T1-l°|>0}=0 

M-*oo 

(ii)  As  M ->oo  the  distribution  of  |\  tends  to  N(£    ~-)>  where 

i  i    M 


2 

a 


2 

N(u,a  )  denotes  a  normal  distribution  having  mean  |i  and 

variance  a   . 

(iii)  f.    is  an  efficient  estimate  of  |.  from  the  class  of  all 
l  l 

consistent  and  asymptotically  normal  estimates.   That  is, 

there  is  no  estimate  which  satisfies  (i)  and  (ii)  and 

which  also  has  a  smaller  sampling  variance  than  £  . 

Thus  the  minimum  chi  square  estimation  procedure  provides  parameter 

estimates  having  desirable  asymptotic  properties.   In  addition,  the 

2 
minimum  value  of  the  X  statistic  may  be  used  to  measure  the  fit  of 

29 

the  model  in  the  "goodness  of  fit"  sense. 


4.3.2  Minimum  Chi  Square  Estimates  of  the  Cohesive  Elements  Model 

Recall  from  the  reparameterization  in  Section  4-1  that  the  expected 
proportion  giving  response  A.  at  time  t  is  given  by  4.1-7.   Similarly, 
the  expected  proportion  of  respondents  making  response  A1  at  time  zero 
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and  again  at  time  t  is  given  by  4.1-8.   The6e  equations  represent  the 
expected  response  proportions  for  t=0,  1,2,...T  under  the  hypothesis 
that  the  Cohesive  Elements  Model  is  a  valid  model  for  the  data  which 
were  actually  observed. 

For  any  time  t  a  convenient  format  in  which  to  consider  the  ex- 
pected response  proportions  is  that  of  a  2  x  2  association  table  such 
as  that  given  as  Table  4-1. 

TABLE  4-1 
EXPECTED  RESPONSE  PROPORTIONS  AT  TIMES  9  AND  t 


P(0,t) 

P(0)-P(0,t) 

P(0) 

P(t)-P(0,t) 

1-P(0)-P(t)+P(0,t) 

1-P(0) 

P(t) 

1-P(t) 

1.0 

By  the  hypothesis  of  the  model,  the  observations  are  cross-sectionally 

30 
independent.    That  is,  the  model  assumes  that  individuals  respond 

independently  of  one  another  on  any  given  response .occasion. 

The  expected  proportions  in  each  cell  of  Table  4-1  are  functions 
of  the  parameters  P(0),  P(0,0),  a,  and  k.  We  will  thus  need  four 
degrees  of  freedom  to  estimate  the  parameters.  Since  the  cell  pro- 
portions must  sum  to  one,  a  single  tabulation  such  as  that  in  Table 
4-1  will  only  have  4-1=3  degrees  of  freedom.  We  see  immediately  that 
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a  single  tabulation  of  the  expected  response  proportions  is  not  suf- 
ficient to  identify  the  model. 

In  order  to  be  able  to  estimate  the  parameters  of  the  model,  we 
find  that  we  must  utilize  more  than  one  such  association  table.   The 
Pseudo-Bernoulli  Trials  Axiom,  R2,  assures  us  that  under  the  null 
hypothesis  that  the  model  is  "true"  the  successive  responses  of  each 
individual  respondent  will  be  stochastically  independent.  Hence  we 
may  obtain  a  series  of  independent  association  tables  under  the  null 
hypothesis  of  the  Cohesive  Elements  Model.  For  example,  with  a  se- 
quence of  five  responses  we  would  have  the  four  tables  given  as  Table 
4-2.  The  independence  of  these  tables  allows  us  to  sum  the  chi  square 
statistics  computed  from  each  table  as  well  as  their  respective  degrees 
of  freedom.   In  formulating  these  chi  square  statistics  we  must  use 
the  observations  and  expected  proportions  corresponding  to  the  cells 
in  the  tabulations  of  Table  4-2.   It  is  not  appropriate  to  mix  cell 

frequencies  and  marginal  frequencies  when  computing  the  chi  square 

•   ,    31 
statistics. 

For  a  sequence  of  five  responses  from  each  respondent  the  expected 

proportions  as  a  function  of  the  parameters  appear  in  Table  4-3.  The 

parameters  are  P(0),  P(0,0),  a,  and  k.   Of  these,  only  P(0)  is  directly 

32 

estimable  by  means  of  the  observed  proportion  Q(0). 

2 
Before  turning  to  the  specification  of  the  X  statistics  for  each 

table,  it  is  well  to  consider  the  question  of  degrees  of  freedom.  Re- 
call that  we  are  letting  Q(t)  and  Q(0,t)  denote  the  observed  proportions 
corresponding  to  the  expected  proportions  P(t)  and  P(0,t),  respectively. 
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TABLE  4-2 
FIVE-RESPONSE  ASSOCIATION  TABLES 

TIME    1 


TIME  0 


P(0,D 

P(0)-P(0,1) 

P(0) 

P(D-P(0,1) 

1-P(0)-P(1)+P(0,1) 

1-P(0) 

P(l) 

1-P(1) 

1.0 

TIME   2 


TIME  0 


P(0,2) 

P(0)-P(0,2) 

P(0) 

P(2)-P(0,2) 

1-P(0)-P(2)+P(0,2) 

1-P(0) 

P(2) 

1-P(2) 

1.0 

TIME   3 


TIME  0 


Al 

A0 

P(0,3) 

P(0)-P(0,3) 

P(0) 

P(3)-P(0,3) 

1-P(0)-P(3)+P(0,3) 

1-P(0) 

P(3) 

1-P(3) 

1.0 

TIME  4 


TIME  0 


Al 

"0 

P(0,4) 

P(0)-P(0,4) 

P(0) 

P(4)-P(0,4) 

1-P(0)-P(4)+P(0,4) 

1-P(0) 

P(4) 

1-P(4) 

1.0 
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TABLE  4-3 
EXPECTED  PROPORTIONS  AS  A  FUNCTION  OF  THE  PARAMETERS 
SEQUENCE  OF  FIVE  RESPONSES 

P(0)        =     P(O) 


P(l)        =     P(0)e"k  +  a(l-e"k) 


-2k  -2k 

P(2)       =     P(0)e         +  a(l-e       ) 


-3k  -3k 

P(3)        =     P(0)e         +  a(l-e       ) 


-4k  -4k 

P(4)       =     P(0)e         +  a(l-e       ) 


P(0,1)   =     P(0,0)e~     +  p(0)a(l-e"   ) 


-2k  -2k 

P(0,2)   =     P(0,0)e         +  P(0)a(l-e       ) 


-3k  -3k 

P(0,3)    =     P(0,0)e         +  P(0)a(l-e       ) 


-4k  -4k 

P(0,4)    =     P(0,0)e         +  P(0)a(l-e       ) 
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Suppose  that  we  have  a  sequence  of  five  responses  by  each  individual 
respondent.   This  is  the  case  considered  in  Table  4-2  and  Table  4-3. 
In  Table  4-2,  if  we  know  Q(t),  t=0,...,  4,  and  Q(0,t),  t=l,...,4,  we 
have  sufficient  information  to  enter  the  observed  proportion  in  each 
cell  of  each  of  the  four  association  tables.   Since  these  nine  ob- 
servations are  sufficient  to  identify  the  observed  response  propor- 
tions in  each  cell  of  these  tables,  the  four  association  tables  given 
as  Table  4-2  have  a  total  of  nine  degress  of  freedom  for  error.   If 
we  know  the  values  of  the  parameters  —  a,  k,  P(0),  and  P(0,0)--we 
could  use  all  nine  degrees  of  freedom  to  test  the  model.   In  general, 
we  will  have  to  estimate  the  model's  parameters  from  the  data.  The 
degrees  of  freedom  for  error  are  reduced  by  one  for  each  parameter 
which  we  must  estimate.   For  the  case  given  in  Table  4-2  we  would 
have  9-4=5  degrees  of  freedom  remaining  for  error  since  we  must  esti- 
mate four  parameters.   Hence  for  sequences  of  five  consecutive  re- 

2 
sponses  the  minimum  value  of  X  may  be  tested  against  a  chi  square 

statistic  having  five  degrees  of  freedom.  For  every  response  added 

or  deleted  from  the  sequence,  two  degrees  of  freedom  are  gained  or 

lost,  respectively.   As  is  true  in  the  regression  procedure  of  Section 

4.2,  at  least  three  responses  must  be  available  for  each  respondent 

in  order  to  achieve  identification  in  the  model. 

We  turn  now  to  the  formulation  of  the  chi  square  statistic  for 

the  Cohesive  Elements  Model.   For  a  sequence  of  T+l  responses  from 

each  individual  in  the  sample,  we  have 
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4.3-5 


2     l        (Q(0,t)-P(0,t))2     (Q(0)-Q(0,t)-P(0)+P(0,t)): 
t  =  I        P(°>t}  P(0)-P(0,t) 


.   lQ(t)-Q(0,t)-P(t)+P(0,t))2 
P(t)-P(0,t) 

{Q(0,t)-Q(O)-Q(t)+P(O)+P(t)-P(O,t))2 

1-P(0)-P(t)+P(0,t)         J 

where  the  P(t)'s  and  the  P(0,t)'s  are  given  by  4.1-7  and  4.1-8, 

respectively.   If  we  estimate  a,  k,  P(0),  and  P(0,0)  from  the  data, 

2 
the  X  in  4.3-5  is  distributed  as  a  chi  square  having  2T-3  degrees 

of  freedom  provided  our  sample  size  is  large  enough  for  the  asymptotic 

2 
results  to  hold.   Hence  the  empirical  X  statistic  may  be  tested 

33 

against  a  chi  square  statistic  having  2T-3  degrees  of  freedom. 

In  the  above  formulation  we  were  able  to  sum  the  chi  square 
statistics  from  the  association  tables  of  Table  4-2  because  the  Cohesive 
Elements  Model  (in  this  case  our  null  hypothesis)  postulates  cross- 
sectional  as  well  as  over  trials  independence  of  responses. 


4.3.3  Numeric  Minimization  of  the  Chi  Square  Statistic 

While  we  could  express  4.3-5  in  terms  of  the  parameters  and 
then  formally  attempt  to  minimize  the  resulting  expression,  it  is 
clear  from  the  non-linear  nature  of  the  expected  proportions  that 
such  an  endeavor  will  not  yield  analytically  tractable  estimation 
equations.   Fortunately,  the  advent  of  the  high  speed  digital  computer 
has  made  numeric  minimization  of  4.3-5  computationally  feasible. 
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One  approach  would  be  that  of  an  exhaustive  grid  search  of  the 

four  dimensional  parameter  space  of  the  Cohesive  Elements  Model.   Such 

34 
a  procedure  has  been  applied  in  mathematical  psychology.    A  variant 

of  this  approach  would  be  to  perform  a  two  stage  grid  search.   The 

first  stage  would  apply  a  rather  gross  grid  in  order  to  determine  the 

general  locus  of  the  minimum.   Then  the  second  stage  would  apply  a 

2 
finer  grid  in  the  neighborhood  of  the  minimum  X  found  in  stage  1. 

Multi-stage  generalizations  of  this  procedure  are  also  feasible. 

An  iterative  procedure  could  also  be  developed  using  a  system  of 
derivatives  such  as  4.3-3  or  4.3-4.  The  derivatives  would  be  evalu- 
ated at  the  current  values  of  the  parameters.   These  derivative  values 
would  then  be  a  signal  as  to  which  parameter  to  change  in  what  di- 
rection on  each  iteration  as  the  procedure  seeks  the  vector  J_  which 

2 
will  minimize  X  (I)   given  as  4.3-5.  Care  must  be  taken  to  ensure  that 

the  procedure  doesn't  lead  to  a  local  minimum,,  A  perturbation  of  the 

parameter  values  could  be  used  to  guard  against  this. 

In  order  to  reduce  the  exhaustive  grid  search  to  manageable  pro- 
portions, we  must  first  consider  explicit  and  reasonable  constraints 
on  the  parameter  values.   In  Section  4-1  we  examined  the  three  ex- 
plicit constraints  on  the  parameters.   See  4.1-10,  4.1-11,  and  4.1-13. 

The  parameter  k  is  unconstrained  from  above  (see  4.1-10).   Recall 
that  k  is  a  measure  of  the  rate  at  which  the  model  will  approach 

equilibrium  from  any  non-equilibrium  position.   In  Table  4-4  we  have 

-kt        -kt  -kt 

tabulated  e   '  and  1-e   "  versus  time  for  k=l,2,3.   The  term  1-e 

is  the  weight  given  to  "a"  in  determining  P(t)  as  a  linear  combination 
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of  P(0)  and  "a."  See  4.1-7.  We  see  from  Table  4-4  that  P(t)  ap- 
proaches "a"  very  quickly,  even  when  k=l.   In  most  practical  appli- 
cations of  the  model  it  is  highly  unlikely  that  k  will  exceed  one. 


TABLE  4-4 

-kt         -kt 
e    AND  (1-e   )  VERSUS  t 


k=l  k=2  k=3 


-t  ,   -t.  -2t  ,   -2t  -3t  ,   -3t 

e  1-e  e  1-e  e  1-e 

1  0.368  0.632  0.135  0.865  0.050  0.950 

2  0.135  0.865  0.018  0.982  0.002  0.998 

3  0.050  0.950  0.002  0.998  0.000  1.000 

4  0.018  0.982  0.000  1.000  0.000  1.000 

5  0.007  0.993  0.000  1.000  0.000  1.000 

6  0.002  0.998  0.000  1.000  0.000  1.000 


Hence  in  our  exhaustive  grid  search  we  would  want  to  place  a  con- 
venient upper  bound  on  k,  generally  less  than  one.   If,  perchance, 
the  minimizing  value  of  k  should  fall  on  the  boundary  we  set,  we 
must  then  relax  the  constraint  to  allow  k  to  become  somewhat  larger. 
Note  that  for  k=3  the  expected  proportion  is  nearly  changed  by  a 
step  function  between  t=0  and  t=l.   Such  abrupt  changes  are  unlikely 
in  most  empirical  situations,  especially  in  consumer  product  markets. 

In  Chapter  V  we  present  the  parameter  constraints  which  were 
used  in  the  initial  empirical  test  of  the  Cohesive  Elements  Model. 
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In  addition,  Section  5.4  describes  the  grid  search  minimum  chi  square 
procedure  in  greater  detail. 

4.4  Further  Comments  on  the  Estimation  Procedures 

In  the  preceding  sections  two  alternative  estimation  procedures 
have  been  suggested.   The  present  section  reviews  certain  of  the  pros 
and  cons  of  these  procedures. 

Perhaps  the  major  attraction  of  the  regression  procedure  is  its 
computational  ease  and  speed.   Each  of  the  two  equations  in  the  re- 
gression procedure  is  a  simple  two  variable  regression.   Numerous, 
efficient  computer  codes  are  available  for  making  such  computations. 
On  the  negative  side  is  the  difficulty  of  assessing  the  "goodness  of 
fit"  of  the  model  in  the  regression  case.   It  was  pointed  out  in 
Section  4.2.4  that  if  the  model  were  actually  true  with  a  parameter 
value  of  k=0,  the  correlation  coefficient  given  in  4.2-11  would  be 
approximately  zero.   This  makes  use  of  the  correlation  coefficient 
(or  its  square)  unsatisfactory  as  a  measure  of  the  "veracity"  of  the 
model.   in  addition,  the  fit  of  a  regression  through  the  origin  such 
as  4.2-13  may  also  be  difficult  to  interpret. 

In  concluding  the  discussion  of  the  regression  procedure,  it 
should  be  noted  that  the  small  sample  properties  of  the  estimates 
remain  a  problem.   In  general,  the  regression  procedure  will  require 
a  much  longer  sequence  of  responses  than  will  the  minimum  chi  square 
method. 

The  advantages  of  the  minimum  chi  square  method  are  clear  cut. 
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The  estimates  are  BAN  estimates  and  the  asymptotic  results  are 
achieved  as  the  number  of  respondents,  not  the  number  of  responses, 
increases.  With  commercial  consumer  panels  such  as  MRCA's, large 
samples  of  respondents  are  very  feasible.   In  addition,  this  pro- 
cedure yields  a  single,  less  ambiguous  measure  of  the  total  "fit" 

of  the  model.   The  principal  drawback  is  in  the  enormous  amount  of 

2 
computation  involved  in  numeric  minimization  of  the  X  statistic. 

In  terms  of  the  long  run  search  for  knowledge  of  buyer  behavior, 

the  minimum  chi  square  procedure  offers  major  advantages „   Now  that 

several  alternative  models  of  consumer  behavior  are  available,  the 

next  step  must  be  to  systematically  compare  them  with  one  another 

over  a  wide  range  of  product  classes.   Use  of  the  minimum  chi  square 

method  to  estimate  alternative  models  will  enable  us  to  compare  them 

2 
with  each  other.   In  particular,  if  two  models  generate  minimum  X 

statistics  having  the  same  number  of  degrees  of  freedom,  that  model 

2 
having  the  smaller  X  best  fits  this  sample  of  data.   If  the  degrees 

35 

of  freedom  differ,  we  may  compare  their  respective  "p"  levels. 

4. 5   Summary 

In  this  chapter  we  presented  two  alternative  estimation  pro- 
cedures for  the  Cohesive  Elements  Model.   The  recursive  regression 
method  yields  somewhat  ambiguous  measures  of  "goodness  of  fit"  in 
addition  to  small  sample  parameter  estimates  for  which  we  only  know 
the  large  sample  properties.   The  minimum  chi  square  procedure  pro- 
vides a  single,  unambiguous  measure  of  the  "fit"  of  the  model  as 
well  as  parameter  estimates  with  known  statistical  properties.  When 
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computationally  feasible,  the  minimum  chi  square  method  is  to  be 
preferred. 

Footnotes 

1.  While  Coleman  (1964a,  pp.  96-97)  recognized  the  need  for  better 
estimation  procedures  and  for  methods  of  testing  the  model,  he 
left  the  development  of  these  methods  and  procedures  to  future 
research.   In  most  instances  he  seemed  content  with  methods 
which  would  just  identify  the  parameters.   In  no  case  did  he 
consider  the  properties  of  the  estimates  or  the  adequacy  of  the 
fit  of  the  model  to  empirical  data. 

2.  For  those  familiar  with  electrical  engineering,  a+B  represents 

a  rate  analogous  to  the  response  constant  of  an  electric  circuit, 

3.  Cramer  (1946,  p.  207). 

4.  See  Cramer  (1946,  pp.  215-16).  There  are  more  general  cases  of 
the  central  limit  theorem.  See  Parzen  (1960,  Chpt.  10). 

5.  Coleman  (1964a,  Chpt.  3)  gives  4.2-3.   In  none  of  his  work  on 
this  model  does  he  present  4.2-8.   In  general,  Coleman  (1964a, 
Chpt.  3  and  5  and  1964b,  Chpt.  13)  simply  identifies  the  pa- 
rameters from  the  data.  That  is,  he  uses  sequences  of  three 
responses  per  respondent  to  solve  for  the  model  parameters.  This 
procedure  is  just  sufficient  to  identify  the  parameters  and  leaves 
no  degrees  of  freedom  for  error.  The  lone  exception  would  seem 

to  be  Coleman's  use  of  Cohen's  data  (Coleman  1964a,  Chpt.  3, 
Section  1).   In  this  case  he  used  4.2-3  as  a  regression  relation 
to  estimate  k.   He  was  not  concerned  with  the  properties  of  this 
estimate  of  k  nor  with  any  statistical  measure  of  the  "goodness 
of  fit"  of  his  model.   The  parameter  a  was  also  neglected. 

6.  This  assumption  of  additive  specification  errors  is  made  for  con- 
venience and  is  not  implied  by  the  theory. 

7.  See  Johnston  (1963).   See  especially  Chapters  7  and  8  for  problems 
of  autocorrelation,  heteroscedasticity,  and  lagged  variables. 

8.  For  a  treatment  of  errors  in  variables  and  the  problems  which 
they  generate  see  Klein  (1953,  pp.  282-305),  Johnston  (1963, 
Chpt.  6),  or  Goldberger  (1964,  Chpt.  6). 

9.  See  Massy  in  Frank,  Kuehn,  and  Massy  (1962,  pp.  83-85). 
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10.  It  is  well  known  that  simultaneous  equations  cannot  be  treated 
as  separate  regressions  without  subjecting  the  results  of  the 
analysis  to  considerable  error.   See,  for  example,  Johnson 
(1963)  and  Goldberger  (1964). 

11.  See  Klein  (1953,  pp.  80-92  and  100-122)  and  Johnson  (1963,  pp. 
265-66),  and  Goldberger  (1964,  pp.  354-356)  for  further  informa- 
tion regarding  recursive  systems  of  equations. 

12.  Thus,  in  a  recursive  system  the  direction  in  which  to  minimize 
the  sum  of  the  squared  errors  is  clear  in  each  equation  of  the 
system. 

13.  Provided,  of  course,  that  the  standard  regression  assumptions-- 
including  the  assumption  of  normally  distributed  errors--are  made, 

14.  See  Klein  (1953,  pp.  112-113). 

15.  See  Klein  (1953,  Chapter  3,  Section  2)  for  the  logic  of  this  pro- 
cedure in  a  recursive  system  of  this  nature. 

16.  See  Johnston  (1963,  Chpt.  1). 

17.  See  Mood  (1950,  Chpt.  8). 

18.  See  Goldberger  (1964,  p.  130). 

19.  This  is  not  uncommon  practice  in  econometric  applications. 

20.  This  estimation  procedure  is  patterned  after  that  given  by  Klein 
(1953  ,  Chpt.  3,  Sections  2  and  4)  for  recursive  systems. 

21.  This  procedure  is  termed  restricted  least  squares  in  the  econo- 
metrics literature. 

22.  To  the  present  author's  knowledge,  these  results  have  not  been 
published  except  in  working  paper  form  at  the  Netherlands  Insti- 
tute for  Economics,  Rotterdam.  At  the  time  this  report  was  pre- 
pared, this  paper  was  not  available  to  the  present  author  for 

comment . 

23.  See  Brownlee  (1960,  pp.  298-302)  for  statistical  tests  of  re- 
gressions through  the  origin. 

24.  Note  that  a  minimum  chi  square  procedure  could  be  developed  for 
other  cases  of  the  General  Latent  Markov  Model--that  is,  for 
alternative  specifications  of  A.  an<*  u .  • 
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25.  See  Cramer  (1946,  pp.  424-434  and  p.  506)  and  Neyman  (1949). 

26.  See  Cramer  (1946,  p.  426). 

27.  See  Cramer  (1946,  p.  427). 

28.  See  Neyman  (1949). 

29.  Note  that  the  minimum  chi  square  procedure  obtains  the  best  fit 
of  the  model  to  the  data  in  much  the  same  sense  as  least  squares 
regression  obtains  the  best  fit.   For  further  applications  of  the 
method  of  minimum  chi  square  to  the  problem  of  estimating  sto- 
chastic models  of  consumer  brand  choice  behavior  see  Massy  (1965) 
and  Morrison  (1965b). 

30.  It  was  noted  in  Chapter  II  that  this  assumption  is  valid  for  ap- 
plications involving  the  use  of  consumer  panel  data.   The  method 
by  which  panel  members  are  recruited  generally  assures  us  that 
they  will  be  sufficiently  distributed  geographically  to  make  this 
assumption  substantially  correct.   The  empirical  results  reported 
in  Chapter  V  are  all  based  upon  consumer  panel  data. 

31.  See  Yule  and  Kendall  (1945,  p.  413). 

32.  We  may  estimate  P(0)  by  the  observed  proportion  Q(0).   This  pro- 
cedure was  used  in  developing  the  results  reported  in  Chapter  V. 

33.  The  above  procedure  essentially  considers  the  respondents  from 
whom  we  obtained  observations  to  be  a  random  sample  from  some 
larger  population  of  respondents  in  which  our  interest  centers. 
We  merely  wish  to  use  the  sample  to  draw  inferences  to  this 
larger  population.   This,  in  essence,  corresponds  to  Morrison's 
(1965b)  concept  of  universe  parameter. 

34.  See  Atkinson,  Bower,  and  Crothers  (1965,  Chpt.  9). 

35.  See  Atkinson,  Bower,  and  Crothers  (1965,  pp.  387-395)  for  elabor- 
ation of  this  point. 


Chapter  V 
SOME  EMPIRICAL  RESULTS 

The  primary  purpose  of  this  chapter  is  to  present  an  initial 
empirical  test  of  the  Cohesive  Elements  Model  which  was  developed 
in  Chapters  II  and  III.   In  addition,  the  empirical  analysis  will 
enable  us  to  contrast  the  estimates  which  are  obtained  from  the 
minimum  chi  square  and  regression  procedures.   Prior  to  the  presen- 
tation of  the  actual  results,  we  will  need  to  consider  certain 
topics  of  both  general  and  specific  empirical  interest.  We  first 
consider  the  particular  empirical  situation  which  will  serve  as  a 
test  of  the  model.   Operational  problems  will  be  discussed  next. 
Empirical  household  classification  and  inclusion  rules  are  then 
presented.   The  grid  search  procedure  for  minimizing  the  chi  square 
goodness  of  fit  statistic  is  summarized  next.   This  is  followed  by 
the  results  of  an  application  of  the  Cohesive  Elements  Model  to  a 
consumer  product  market  when  the  minimum  chi  square  estimation  pro- 
cedure was  used.   The  final  section  compares  regression  estimation 
results  to  those  obtained  by  the  minimum  chi  square  procedure. 

5.1  A  Dynamic  Market;   The  A.D.A.  Endorsement  of  Crest 

The  data  which  will  be  utilized  to  test  the  Cohesive  Elements 
Model  are  the  Market  Research  Corporation  of  America's  National  Con- 
sumer Panel  records  of  dentifrice  purchases  for  the  period  immediately 
preceding  and  the  period  subsequent  to  the  American  Dental  Associ- 
ation endorsement  of  Crest  as  a  decay  preventative  dentifrice.   In 
this  section  discussion  will  center  on  this  dynamic  market  situation 
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and  the  reasons  why  it  is  of  interest  as  a  first  test  of  the  Cohesive 
Elements  Model.   For  an  evaluation  of  consumer  panels  as  a  source  of 
marketing  data  see  Boyd  and  Westfall  (1960). 

The  unprecedented  American  Dental  Association  endorsement  of 
Crest  toothpaste  on  August  1,  1960,  gave  Crest  an  unusual  differentiating 
appeal  in  the  dentifrice  market.   Prior  to  this  endorsement  Crest  had 
obtained  a  respectable  but  unspectacular  share  of  the  dentifrice  mar- 
ket since  its  introduction  in  the  mid-1950's.  While  a  portion  of 
the  market  prior  to  this  endorsement  had  undoubtedly  been  aware  of 
the  potential  benefit  of  a  flouride  toothpaste,  the  endorsement  by 
a  non-commercial,  professional  group  gave  Crest  a  legitimated  dif- 
ferential appeal.   This  legitimated  appeal,  coupled  with  an  inten- 

2 
sive  promotional  campaign,  swept  Crest  to  the  forefront   in  the 

dentifrice  market  in  a  matter  of  a  few  months.   Hence  the  dentifrice 
market  was  in  a  state  of  considerable  transition  in  the  period  im- 
mediately following  the  A.D.A.  endorsement. 

There  are  several  reasons  for  choosing  this  market  situation  as 
an  initial  test  of  the  Cohesive  Elements  Model.   First,  the  A.D.A. 
endorsement  of  Crest  allows  us  to  segment  the  response  data  into 
before  and  after  periods  which  are  meaningful  in  terms  of  the  be- 
havior represented  by  the  model.   The  before  period  represents  a 
relatively  normal  market  situation  where  the  dynamics  of  response 
probability  exhibit  no  particularly  strong  trend.   On  the  other 
hand,  the  significant  market  impact  of  the  A.D.A.  endorsement  of 
Crest  gives  us  an  after  period  in  which  the  market  was  in  a  rapid 
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state  of  transition.   In  the  latter  case,  the  response  probability  of 
many  individuals  was  undergoing  very  rapid  and  significant  change. 
Thus  the  dentifrice  market  in  the  periods  just  before  and  just  after 
the  Crest  endorsement  affords  us  the  opportunity  to  test  the  Cohesive 
Elements  Model  in  both  a  "normal"  and  a  "transient"  period. 

The  behavior  of  the  model  in  each  of  these  periods  is  of  con- 
siderable interest  from  the  standpoint  of  testing  its  empirical 
viability.   It  might  turn  out  that  the  model  breaks  down  in  either 
the  "transient"  or  the  "normal"  market  case  for  reasons  which  are 
not  clear  a  priori.   The  MRCA  dentifrice  data  have  two  principal  ad- 
vantages from  the  standpoint  of  examining  the  model's  performance  in 
these  contrasting  market  situations.  In  the  first  place,  the  denti- 
frice data  for  both  time  periods  are  for  the  same  product  class  and 
the  same  set  of  brands.   If  the  "normal"  period  were  for  one  product 
classification  and  set  of  brands,  and  the  "transient"  period  were  for 
yet  another,  the  behavior  of  the  model  in  each  of  these  contrasting 

market  situations  would  be  confounded  with  the  question  of  the  be- 

3 
havior  of  the  model  in  different  product  classes.    Hence,  if  the 

model  fits  well  in  one  period  but  not  in  the  other,  this  failure  may 

more  reasonably  be  attributed  to  a  breakdown  of  the  model  in  that 

type  of  market  situation  since  we  have  used  the  same  product  in  both 

cases.   Secondly,  the  MRCA  data  provide  continuous  purchasing 

records  for  the  same  group  of  households  in  the  "transient"  as  in 

the  "normal"  period.   Should  the  model  perform  poorly  in  one  of 

the  two  cases,  this  control  of  respondent  to  respondent  variability 
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should  once  again  enable  us  to  more  reasonably  suspect  the  model's 
appropriateness  in  the  type  of  market  situation  where  it  broke  down. 
Thus  the  use  of  the  dentifrice  data  centered  about  the  ADA  endorse- 
ment provides  a  bonus  opportunity  to  study  the  performance  of  the 
model  under  two  contrasting  market  conditions  for  the  same  product 
class  and  the  same  sample  of  respondents. 

The  second  major  advantage  of  using  the  MRCA  dentifrice  data 
to  test  the  Cohesive  Elements  Model  is  that  the  brand  of  interest, 
Crest,  was  already  an  established  brand  at  the  time  of  the  ADA  en- 
dorsement in  August,  1960.   Hence,  in  both  the  before  and  after 

4 
periods  Crest  was  available  on  a  fully  distributed  basis.   Thus, 

lack  of  availability  should  not  be  a  reason  for  a  consumer  choosing 

5 
some  brand  other  than  Crest. 

Third,  dentifrice  is  purchased  frequently  enough  to  provide  se- 
quences of  brand  choices  which  are  long  enough  to  provide  ample  de- 
grees of  freedom  for  testing  the  model. 

Fourth,  MRCA's  National  Consumer  Panel  has  several  thousand 
member  households.   Such  a  large  sample  of  households  will  enable 
us  to  segment  the  sample  into  meaningful  groups  and  still  have 
enough  respondents  in  each  group  to  have  confidence  that  the  asymp- 
totic properties  of  the  estimates  hold. 

Finally,  the  Market  Research  Corporation  of  America  agreed 

to  make  its  dentifrice  data  from  the  National  Consumer  Panel 

6 
available  for  use  in  this  study. 
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5.2  Operational  Considerations 

5.2.1  Time  Problems 

In  the  Cohesive  Elements  Model  an  individual's  response  proba- 
bility is  dynamically  described  by  a  continuous  time  stochastic  pro- 
cess.  However,  implicit  in  the  aggregation  and  estimation  procedures 
developed  in  Chapter  III  and  Chapter  IV  is  the  assumption  that  all 
respondents  respond  at  the  same  points  of  time.  Recall  that  the  re- 
sponse proportions  are  to  be  computed  at  t=0,l,2,...,  T.  A  diagram- 
matic illustration  of  the  response  occasions  and  the  evolution  of  re- 
sponse probability  for  a  typical  respondent  would  serve  to  clarify 
the  situation.   See  Figure  5-1. 

FIGURE  5-1 

RESPONSE  OCCASIONS  AND  EVOLUTION  OF  RESPONSE 

PROBABILITY  FOR  A  TYPICAL  RESPONDENT 


Time 


X(t)  =  P(A  at  time  t) 

*  Denotes  a  choice  occasion 
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We  see  from  Figure  5-1  that  the  response  probability  X(t)  changes 
in  continuous  time  while  response  occasions  occur  at  regular  periodic 
intervals.   In  the  figure,  response  occasions  are  denoted  by  *  's. 
This  figure  points  up  an  important  clarification  which  must  be  made 
with  respect  to  our  use  of  the  term  "response  probability."  By  this 
we  mean  the  probability  that  a  respondent  will  exhibit  response  A. 
(as  opposed  to  An)>  given  that  he  makes  a  response.   In  this  sense, 
then,  our  use  of  the  term  "response  probability"  is  as  the  proba- 
bility that  a  respondent  will  make  a  particular  response,  A,  if  he 
responds  at  time  t  at  all.   Formally,  we  may  express  this  as  the  fol- 
lowing conditional  probability  statement; 

X(t)  =  P(A.  at  time  t  |  a  response  is  made  at  time  t). 

The  necessity  to  assume  equal  increments  of  time  between  response 
occasions  is  troublesome  in  certain  applications,  particularly  with 
reference  to  applications  which  utilize  consumer  panel  data.   It  should 
be  noted  that  in  some  applications  of  the  Cohesive  Elements  Model  the 
response  occasion  assumption  would  create  no  difficulties  since  the 
nature  of  the  application  dictates  that  all  respondents  make  their 
responses  at  the  same  periodic  intervals.  An  example  of  such  a  case 
would  be  the  multiwave  panel  studies  of  voting  behavior  which  have 
been  conducted  by  sociologists. 

When  the  model  is  applied  to  continuous  consumer  panel  data,  the 
assumption  is  more  artificial.   Unfortunately,  consumers  are  not  so 
obliging  as  to  all  make  their  purchase  decisions  in  some  product  class 
on  identical  days,  or  even  with  identical  time  intervals  between  their 
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purchase  decisions.   Thus  when  we  apply  the  model  to  sequences  of 
choices  among  brands  and  make  the  assumption  of  equal  increments  of 
time  between  response  or  purchase  occasions,  the  correspondence  be- 
tween model  time  and  real  time  between  purchase  events  is  somewhat 
elastic.   Another  diagrammatic  illustration  should  serve  to  clarify 
this  point.   See  Figure  5-2. 

In  Figure  5-2  we  have  two  respondents,  A  and  B.   Purchase  events 
occurring  along  the  time  axis  are  denoted  by  P  ,  where  i  indexes  the 
purchases  made.   In  the  illustration  given  in  Figure  5-2  both  respon- 
dents make  three  successive  purchases.  The  time  elapsed  between  pur- 
chases i  and  i+1  is  denoted  by  AT   for  the  real  time  case  and  ^TM 

R.  M. 

l  i 

for  the  model  time  case. 

For  respondent  A  there  is  no  problem  in  the  correspondence  be- 
tween real  (calendar)  time  and  model  time.   For  this  case,  the  purchase 
events  in  the  real  world  situation  occurred  at  equal  intervals--i.e. , 

A  T„  =AT   .  Since  the  model  assumes  that  A  Tw  =  ATw  for  all 

Rl     R2  Ml     "2 

respondents,  the  fact  that  A  T   =  A-^   in  this  case  enables  us  to 

Rl     R2 
achieve  a  one-to-one  relationship  between  real  and  model  time  by 


simply  considering  A  T   =  AT   =  AT   =  AT   . 

Rl     R2     Ml     "2 
Respondent  B  represents  both  the  far  more  typical  case  and  the 


case  where  the  relation  between  real  time  and  model  time  becomes  some- 
what elastic.  As  has  been  previously  pointed  out,  the  model  implies 

that  AT   =AT  .   However,  in  the  figure  it  is  clear  that  AT       ^AT 

"l     "2  Rl     *2 

Yet  the  purchase  events  in  the  model  are  intended  to  correspond  to  the 

purchase  events  in  the  real  world  process.   Consequently  A  Tw 
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FIGURE  5-2 
RELATIONSHIP  BETWEEN  REAL  TIME  AND  MODEL  TIME 
FOR  THE  COHESIVE  ELEMENTS  MODEL 


Respondent  A 


real   time 


( &  T       ** A  T 


time 


model   time 


AT         >£ AT 


M 


"2 


time  -> 


Respondent  B 


real    time 


Rn 


At     -»« At 


time  -> 


model   time 


« At 


mi 


+ Atm H 


time  -> 
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corresponds  to  A  TD  and  .AT   corresponds  to^T  .   But  since 

AT  -AT       and  A  T   t  AT      ,  it  is  clear  that  the  relation  of 

Ml     "2        Rl     R2 

At  to  the  real  time  interval  A  T       must  differ  from  that  of 

Ml  Rl 

ATU  to  A  T  .   This  elasticity  in  the  model  time  scale  will  prove 

"2       R2 
troublesome  whenever  the  model  is  applied  to  data  in  order  to 

measure  the  impact  of  a  marketing  stimulus  which  operates  in  real 

time--such  as  a  pulse  of  advertising.  We  shall  refer  to  this  aspect 

of  the  time  problem  as  the  inter-respondent  time  problem. 

Another  aspect  of  the  time  problem  is  the  fact  that  the  model 
aggregates  over  individual  respondents  who  have  not  all  made  their 
first,  second,  third,  etc.  purchases  at  the  same  time.   The  model 
treats  them  as  if  they  did.   This  aspect  of  the  time  problem  is 
termed  the  inter-respondent  time  problem. 

An  operational  approach  to  minimizing  the  impact  of  this  aspect 
of  the  timing  problem  is  to  group  respondents  according  to  their 
average  interpurchase  time.  This  should  bring  the  purchase  cycles 
of  the  members  of  each  group  somewhat  closer  together.   This  approach 
is  taken  in  the  empirical  test  of  the  Cohesive  Elements  Model  re- 
ported below.   Perhaps  a  better  approach  would  be  to  group  respondents 
both  according  to  their  average  interpurchase  time  and  their  relative 
variability  of  interpurchase  time.   Relative  variability  is  defined 
in  this  case  as  the  variance  of  the  interpurchase  time  divided  by 
the  square  of  the  average  interpurchase  time.  While  cross  classifi- 
cation on  these  two  dimensions  should  bring  the  purchase  cycles 
within  each  group  closer  together,  sample  size  requirements  increase 
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very  rapidly.  This  consideration  led  to  the  adoption  of  the  simpler 
classification  by  average  interpurchase  time  for  the  purpose  of  this 
initial  test  of  the  Cohesive  Elements  Model. 

There  may  be  factors  which  countervail  the  intra-  and  inter- 
respondent  time  problems  discussed  above.   Should  response  to  a  mar- 
keting stimulus  relate  more  to  product  use  or  purchase  occasions  than 
to  calendar  time,  the  model's  equal  purchase  interval  assumption  may 
be  more  appropriate  than  calendar  time  intervals.   Response  to  a  con- 
sumer product  innovation  might  be  an  example  where  the  inter-respondent 

and  intra-respondent  time  problem  might  not  be  serious.   The  time 

8 
scale  of  the  diffusion  process   for  the  innovation  may  well  be  re- 
lated to  units  of  product  used  rather  than  to  calendar  time.   Even 
though  advertising  messages  concerning  the  innovation  are  received 
in  calendar  time,  the  degree  to  which  they  are  perceived  may  be  a 
positive  function  of  usage  rates.   Thus  the  diffusion  and  adoption 
process  may  well  progress  as  far  in  two  weeks  for  frequent  purchasers 
of  the  product  class  as  in  two  months  for  less  frequent  purchasers. 

Whether  the  elastic  nature  of  the  relationship  between  real 
(calendar)  time  and  model  time  is  a  pitfall  or  a  blessing  is  likely 
to  depend  upon  both  the  specific  product  class  in  question  and  the 
type  of  stimulus  whose  effects  are  to  be  measured  by  the  model.   In 
any  event  the  elastic  model  time  calendar  time  relationship  should 
not  obviate  the  use  of  this  model  in  the  continuing  attempt  to  under- 
stand the  mechanisms  of  consumer  brand  choice  behavior. 
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5.2.2  Operational  Definition  of  Brand  Choice  Events 

In  empirical  applications  of  the  Cohesive  Elements  Model  to  con- 
sumer brand  choice,  it  is  necessary  to  have  an  unambiguous  definition 
of  a  purchase  decision  or  brand  choice  event.   The  fact  that  consumers 
may  and  do  make  multiple  purchases  of  the  same  or  different  brands  on 
the  same  day  makes  it  necessary  to  consider  a  reasonable  and  precise 
operational  rule  for  determining  brand  choice  events. 

After  the  two  operational  decision  rules  for  the  multiple  pur- 
chase case  are  presented,  consideration  will  be  given  to  the  justi- 
fication for  using  these  rules  and  to  illustrative  examples.  The 
operational  rules  for  multiple  purchases  of  the  same  or  different 
brands  on  the  same  day  are: 

Rule  1.   Multiple  purchases  of  the  same  brand  on  the  same  day 

will  be  considered  to  be  a  single  brand  choice  decision 

for  the  purpose  of  estimating  the  model. 
Rule  2.  Multiple  purchases  of  different  brands  on  the  same  day 

will  be  considered  to  be  separate  brand  choice  decisions 

for  each  brand. 
The  first  rule  is  justifiable  in  terms  of  the  process  which  the 
Cohesive  Elements  Model  purports  to  represent.   The  model  seeks  to 
describe  the  dynamics  of  brand  choice  behavior  independent  of  the 
amount  purchased  on  any  given  purchase  occasion.   If  a  consumer  pur- 
chases several  packages  of  a  single  brand  on  a  given  shopping  trip, 
this  behavior  most  likely  represents  a  single  brand  decision  followed 
by  an  inventory  decision--i.e. ,  the  decision  to  purchase  more  than 
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one  package  of  the  chosen  brand.   While  there  may  be  a  few  cases  where 
the  purchase  of  several  packages  of  the  same  brand  represents  a  series 
of  brand  decisions,  it  seems  more  likely  that  it  is  an  inventory  de- 
cision once  the  brand  of  purchase  has  been  chosen. 

The  second  rule  follows  the  same  type  of  logic.   The  fact  that 
a  consumer  purchased  more  than  one  brand  on  a  particular  shopping  trip 
makes  it  obvious  that  more  than  one  brand  decision  has  been  made. 
However,  if  the  consumer  purchases  two  packages  of  one  of  the  brands 

purchased  on  a  given  day,  these  two  packages  simply  count  as  a  single 

9 
brand  decision  consistent  with  the  logic  of  Rule  1, 

It  should  be  pointed  out  that  the  operational  rules  defined  above 
were  applied  prior  to  the  recoding  of  the  brands  into  a  0-1  process. 
The  recoding  step  codes  a  "brand  decision"  to  purchase  Crest  as  a  1 
while  a  "brand  decision"  to  purchase  some  other  brand  is  coded  as  0. 
Since  the  operational  rules  are  applied  prior  to  this  recoding,  the 
purchase  of  one  unit  of  Stripe  and  one  unit  of  Dr.  Lyons  on  the  same 
day  would  be  coded  00.   If  the  recoding  were  done  first,  the  opera- 
tional rules  would  then  record  this  as  a  single  brand  decision  0, 
which  clearly  would  violate  the  notion  we  wished  to  capture  in  de- 
veloping our  operational  rules. 

Perhaps  the  application  of  these  rules  is  best  clarified  by  a 
consideration  of  some  hypothetical  examples.   Three  illustrative 
cases  are  given  in  Table  5-1. 

Consumer  1  illustrates  the  application  of  Rule  1.  We  see  that 
the  multiple  purchase  of  Colgate  on  8/7/60  is  simply  coded  as  a  0 
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while  the  multiple  purchase  of  Crest  on  9/25/60  is  recoded  as  a  1. 

Consumer  2  illustrates  Rule  2.   Here  we  see  that  he  purchased 
both  Stripe  and  Crest  on  9/30/60.   Rule  2  tells  us  to  recode  this 
as  01. 

Finally,  consumer  3  illustrates  the  notion  that  Rule  1  and  Rule 
2  are  to  be  applied  prior  to  the  recoding  of  the  "brand  decision"  se- 
quence into  a  0-1  vector.   Here  we  see  that  a  purchase  of  both  Stripe 
and  Dr.  Lyons  on  10/30/60  is  recoded  as  two  "non-Crest"  purchase 
events--!. e. ,  as  00. 

TABLE  5-1 
HYPOTHETICAL  PURCHASE  SEQUENCES 


Consumer 

Date 

Brand  Purchased 

Quantity 

0/1  Vector* 

1 

8/7/60 

8/22/60 

9/25/60 

Colgate 

Crest 

Crest 

2 

10/31/60 

Ipana 

0110 

2 

8/25/60 
9/30/60 
9/30/60 

Colgate 

Stripe 

Crest 

11/14/60 

Crest 

0011 

3 

8/15/60 

10/30/60 

10/30/60 

Crest 
Stripe 
Dr.  Lyons 

11/3/60 

Ipana 

1000 

*  A  Crest  "brand  decision"  is  coded  as  a  1.  A  "brand  decision"  in- 
volving the  purchase  of  some  brand  other  than  Crest  is  coded  as  0. 
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5.3   Empirical  Cases  and  Criteria  for  Household  Inclusion 

In  this  initial  empirical  exploration  of  the  Cohesive  Elements 
Model,  two  cases  will  be  considered.   Case  1  is  a  before-after  analysis 
of  the  impact  of  the  ADA  endorsement  of  Crest.  As  was  noted  in  Sec- 
tion 5.1,  it  also  provides  a  bonus  opportunity  to  study  the  behavior 
of  the  model  in  two  contrasting  market  periods.   The  principal  ob- 
jective in  Case  II  is  to  contrast  the  measures  of  fit  and  the  parameter 
estimates  which  are  obtained  from  the  regression  procedure  to  those 
which  may  be  obtained  from  the  minimum  chi  square  method.   This  purpose 
is  best  served  by  the  increase  in  the  number  of  households  which  may 
be  included  in  the  analysis  by  use  of  the  Case  II  criteria  for  house- 
hold inclusion  rather  than  the  Case  I  criteria.   It  will  be  seen  be- 
low that  this  entails  no  loss  of  rigor  and  is  in  part  a  historical 
accident. 

In  any  empirical  application  of  the  Cohesive  Elements  Model  a 
decision  must  be  made  as  to  how  many  successive  responses  are  to  be 
included  in  the  analysis.   The  greater  the  number  of  responses  from 
each  respondent,  the  greater  the  number  of  degrees  of  freedom  for 
error  in  both  the  regression  and  the  chi  square  estimation  procedures. 
In  the  minimum  chi  square  method,  in  particular,  the  greater  the 
number  of  degrees  of  freedom  the  more  rigorous  is  the  test  of  the 
model  since,  in  classical  statistical  terms,  the  model  is  the  null 
hypothesis  in  this  procedure.   In  operational  terms,  this  means 
that  if  data  having  nine  degrees  of  freedom  are  consistent  with 
the  model  this  is  stronger  evidence  in  favor  of  the  model  than  if 
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the  data  had  only  had  five  degrees  of  freedom.   Thus  the  desire  to 
apply  a  rigorous  test  to  the  model  suggests  that  a  long  sequence  of 
successive  responses  for  each  respondent  be  included  in  the  analysis. 

There  are  two  major  countervailing  forces  which  encourage 
restraint  in  the  number  of  responses  per  respondent  to  be  included 
in  the  analysis.  Since  all  respondents  must  contribute  the  same 
number  of  responses  to  the  analysis,  an  excessively  long  response  se- 
quence criterion  would  tend  to  decrease  the  number  of  respondents  who 
can  meet  the  criterion  and  consequently  reduce  the  total  sample  size. 
A  bias  may  also  result  from  the  fact  that  only  very  heavy  purchasers 
of  the  product  class  in  question  may  be  able  to  meet  the  frequency 
criterion.  The  second  major  reason  for  restraint  is  the  fact  that 
while  the  model  attempts  to  describe  the  dynamics  of  response  proba- 
bilities, the  process  which  is  postulated  to  determine  these  dynamics 
is  itself  assumed  to  be  constant  or  stationary  over  the  period  of 
interest.   Since  a  ,  P,  and  Y      may  change  from  time  to  time,  the 
longer  the  sequence  of  responses  that  is  used, the  more  likely  it  is 
that  the  parameters  of  the  process  itself  may  change.   Hence  sample 
size  considerations  and  non-stationarity  of  the  change  process  counter- 
vail the  tendency  to  require  long  purchase  sequences. 

In  this  study  an  appropriate  compromise  purchase  (or  response) 
sequence  length  was  deemed  to  be  seven  consecutive  purchases  in  any 
period  under  study.   This  decision  will  provide  nine  degrees  of 
freedom  in  the  estimation  procedures  without  unduly  restricting  the 
number  of  households  able  to  qualify  for  analysis  or  running  a  great 
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risk  of  non-stationarity  in  the  change  process.   It  should  be  pointed 
out  that  this  was  a  judgmental  resolution  of  the  problem  and  that 
no  precise  decision  rules  are  available. 

Recall  that  in  Section  5.2  the  time  problems  which  arise  when 
the  Cohesive  Elements  Model  is  applied  to  consumer  brand  choice  be- 
havior were  discussed.  The  time  problems  are  particularly  pronounced 
for  product  categories  such  as  dentifrice  which  exhibit  extreme 
variability  between  consumers  in  terms  of  their  average  interpurchase 
time.  A  natural  method  to  use  to  reduce  the  magnitude  of  this  problem 
is  to  segment  the  sample  of  consumers  according  to  their  average 
interpurchase  time.   Such  a  strategy  is  followed  in  the  description 
of  the  two  cases  given  below. 

Case  I.   Before  ADA  Endorsement  Versus  After  ADA  Endorsement 

In  order  for  a  household  to  enter  the  sample  for  this  analysis, 
it  had  to  meet  the  twin  criteria  of  having  reported  seven  purchases 
of  dentifrice  in  both  the  period  immediately  preceding  and  immediately 
subsequent  to  the  ADA  endorsement  of  Crest.   Those  households  meeting 
these  criteria  were  than  subdivided  according  to  their  average  inter- 
purchase time  in  the  pre-ADA  period.   The  groups  are  given  in  Table  5-2, 

Within  each  of  these  groups  the  empirical  proportions  needed  to 
estimate  and  test  the  Cohesive  Elements  Model  were  developed  for  both 
the  before  and  the  after  ADA  periods.   This  before-after  analysis  will 
enable  us  to  test  the  model  in  a  relatively  normal  period  as  well  as 
in  the  transient  market  of  the  post-ADA  period.   In  addition,  the 
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estimates  obtained  in  the  after  period  may  be  contrasted  to  the  estimates 
obtained  in  the  before  period  in  order  to  ascertain  the  impact  of  the 
ADA  endorsement  and  its  accompanying  promotional  effort. 

TABLE  5-2 
CASE  I  GROUPS  AND  SAMPLE  SIZES 

Group  No.  of  Households  in  Group    Average  Interpurchase  Time 

1  637  0  -  30  Days 

2  556  31  -  45  Days 

3  480  46  -  60  Days 

4  894  Over  60  Days 

Case  II.   After  ADA  Endorsement  Average  Time  Segments 

The  purpose  of  this  ca  e  is  to  contrast  the  results  which  may  be 
obtained  from  the  regression  procedure  to  those  which  are  available 
from  the  minimum  chi  square  method  of  estimation.   Recall  from  Chapter 
IV  that  the  greater  the  sample  size  in  terms  of  number  of  households 
included  in  the  analysis,  the  lesser  the  magnitude  of  the  errors  in 
variables  problem  in  the  regression  procedure.   The  acceptance  criterion 
for  a  household  to  enter  this  analysis  is  that  the  household  has  made 
at  least  seven  purchases  in  the  post-ADA  period.   This  criterion,  of 
course,  is  less  restrictive  than  the  twin  criteria  applied  in  Case  I. 
Consequently,  more  households  are  able  to  qualify  for  this  analysis. 
Another  factor  which  contributes  to  the  greater  sample  size  in  this 
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case  is  the  fact  that  MRCA  absorbed  the  J.  Walter  Thompson  panel  into 
its  National  Consumer  Panel  in  1960.  As  a  result,  there  were  many 
households  having  MRCA  records  sufficient  to  meet  the  Case  II  criterion 
even  though  they  could  not  meet  the  Case  I  criteria  due  to  their 
recency  of  inclusion  in  the  National  Consumer  Panel. 

The  groups  defined  by  the  average  interpurchase  time  criterion 
and  their  respective  sample  sizes  are  given  in  Table  5-3. 

TABLE  5-3 
CASE  II  GROUPS  AND  SAMPLE  SIZES 


Group 

No.  of 

Households 

in  Group 

Averag 

e  Interpurchase  Time 

1 

869 

0-30  Days 

2 

963 

31  -  45  Days 

3 

832 

46  -  60  Days 

4 

682 

61-75  Days 

5 

1388 

Over  75  Days 

2 

5.4  Numeric  Minimization  of  X  by  a  Grid  Search  Technique 

In  this  section  we  present  a  verbal  description  of  the  grid  search 
procedure  for  the  approximate  minimization  of  the  chi  square  statistic 
which  was  developed  in  Section  4.3.   After  presentation  of  the  verbal 
description,  our  attention  will  turn  to  an  empirical  examination  of 
the  behavior  of  this  statistic  in  each  parameter  dimension. 
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In  order  to  simplify  the  discussion  of  the  grid  search  procedure, 
let  us  assume  that  we  have  a  two  parameter  model.   This  entails  no 
loss  of  generality.   Suppose  these  parameters  are  denoted  by  t  and  n. 
Further  suppose  that  according  to  our  model  t  and  n  are  bounded  such 
that  they  are  both  non-negative  and  less  than  or  equal  to  one.   The 
feasible  values  of  our  parameters  thus  lie  within  the  unit  square 
in  the  positive  (first)  quadrant  of  the  (t,n)-plane.   See  Figure  5-3. 


FIGURE  5-3 
(t,  n) -PLANE 


feasible 

parameter 

values 


Now  the  X  statistic  of  the  model  is  a  function  of  these  parame- 
ters, say 

2    2 
X  =  X  (t,n) 


That  is,  for  any  feasible  t-n  combination  there  is  an  associated  value 

2  2 

of  X  .   If  the  function  X  (t,n)  is  analytically  tractible,  we  could 

simply  use  the  methods  of  differential  calculus  to  find  the  parameter 

2 
values  which  yield  the  minimum.   In  general,  the  X  functions  will 

be  analytically  intractible.   Thus  we  must  resort  to  a  numeric  minimi- 

10 
zation  procedure. 
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2 
Suppose  we  compute  X  (t,n)  at  the  points  defined  by  t=0,  1/2,  1  and 

n=0,  1/2,  1.   These  nine  points  are  indicated  in  Figure  5-4  by  X's. 


FIGURE  5-4 
(t,  n) -PLANE  WITH  GRID 

Grid:   t=0,  1/2,  1 
n=0,  1/2,  1 

1  *     X    X 
*/2  XXX 


0  * 


0    1/2 


n 


From  among  these  nine  points  we  now  choose  that  (t,n)  pair  which 

2 
yields  the  smallest  value  of  X  .  We  may  then  take  these  values  of  t, 

2 
n,  and  X  as  approximations  to  the  corresponding  values  at  the  true 

minimum.   The  procedure  just  described  is  termed  a  grid  search  numeric 

minimization  of  the  chi  square  statistic.   Naturally,  the  approximation 

can  only  be  improved  by  searching  a  finer  grid.   In  this  case,  suppose 

we  had  searched  the  grid  defined  by  t=0,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6, 

0.7,  0.8,  0.9,  1.0  and  n=0,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8, 

0.9,  1.0.   Had  we  done  this  we  could  have  done  no  worse  in  our  initial 

approximation  and  it  is  likely  that  we  would  have  done  better.   The 
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finer  the  grid,  however,  the  greater  the  computational  burden.   In  this 

2 
case  we  would  have  had  to  compute  the  X  statistic  at  11  x  11  =  121 

points  rather  than  at  just  nine  points  as  in  the  previous  case.   Note 

2 
that  the  X  statistic  is  computed  for  each  point  on  the  grid. 

In  any  application  of  the  grid  search  procedure  it  is  necessary 
to  specify  three  conditions  for  each  parameter—an  upper  bound,  a 
lower  bound,  and  an  increment.   The  bounds  may  be  either  logical  con- 
sequences of  the  model  or  arbitrary  points  established  to  ensure  a 
finite  number  of  computations.   In  the  latter  case  it  is  necessary  to 
ensure  that  the  parameter  which  minimizes  the  statistic  does  not  fall 
on  this  artificial  boundary.   If  it  should  fall  on  an  arbitrary 

boundary  point,  we  should  then  redefine  the  interval  of  search  for 

2 
this  parameter  and  recompute  the  X  statistic  with  the  new  parameter 

bounds.   In  other  words,  when  it  becomes  necessary  to  utilize  arbi- 
trary boundary  points  for  a  parameter,  the  minimum  chi  square  must  be 
at  a  point  which  strictly  lies  inside  the  boundaries.   The  choice  of 
an  appropriate  increment  is  somewhat  arbitrary.   The  finer  the  incre- 
ment, the  better  the  approximation  to  the  true  minimum  chi  square  and 
its  associated  parameter  values  as  well  as  the  greater  the  computa- 
tional burden.   One  approach  might  be  to  search  increasingly  finer 
grids  until  the  solution  is  obtained  to  the  desired  degree  of  accuracy. 

In  the  empirical  application  of  this  grid  search  procedure  which 
is  reported  in  the  next  section  it  seemed  desirable  to  investigate  the 

behavior  of  the  chi  square  statistic  in  each  parameter  dimension  in 

2 
order  to  ascertain  whether  the  X  statistic  was  well  behaved.   For  the 
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chi  square  statistic  to  be  well  behaved  it  should  be  concave  as  a  func- 
tion of  some  parameter  p  when  all  other  parameters  are  held  at  their 
value  which  jointly  minimizes  the  chi  square  statistic.   Such  a  situa- 
tion is  shown  in  Figure  5-5. 


FIGURE  5-5 


X  (p)  CONCAVE  FOR  OTHER  PARAMETERS  AT  OPTIMUM  VALUES 


2 
X  (p) 


Min  X  (p) 


P* 


where  p*  =  value  of  p  which  jointly  minimizes  X 
with  respect  to  all  parameters 


As  an  example  of  the  actual  behavior  of  the  chi  square  statis- 
tics which  were  obtained,  let  us  anticipate  one  of  the  results  of 
Section  5.5.   Consider  group  1  in  the  before  ADA  period  for  Case  I. 
In  this  case  the  parameter  bounds  and  increments  were  those  shown  in 
Table  5-4. 

The  optimal  values  of  the  parameters  are  also  presented  in  Table 
5-4.   For  these  values  the  chi  square  statistic  was  1.933.   With  the 
other  three  parameters  set  equal  to  their  optimal  values,  X(p)  was 
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computed  in  each  parameter  dimension  p  where  p  =  P(0,0),  a,    k.   These 
results  are  presented  in  Table  5-5,  Table  5-6,  and  Table  5-7.   A  check 
of  these  tables  shows  that  the  chi  square  statistic  is  indeed  well  be- 
haved in  each  parameter  dimension. 


TABLE  5-4 
PARAMETER  BOUNDS  AND  INCREMENTS  CASE  I 
GROUP  1,  BEFORE  PERIOD 

Parameter   Optimum  Value*   Lower  Bound   Upper  Bound   Increment 


P(0) 


Q(0)  =  0.161695 


P(0,0) 


0.111145    P  (0) 


P(0) 


0.005 


0.22 


0.00 


0.36 


0.02 


0.0105 


0.0010 


0.0200 


0.0005 


*  P(0)  was  set  equal  to  Q(0),  the  observed  proportion  purchasing 
Crest  at  time  t  =  0. 
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TABLE  5-5 
CHI  SQUARE    IN  P(0,0)    DIMENSION 

Chi  Square  p(0,0) 

1375.105  0.026145 

1098.556  0.031145 

892.169  0.036145 

731.468  0.041145 

602.325  0.046145 

496.037  0.051145 

406.967  0.056145 

331.330  0.061145 

266.512  0.066145 

210.681  0.071145 

162.549  0.076145 

121.229  0.081145 

86.145  0.086145 

56.981  0.091145 

33.662  0.096145 

16.358  0.101145 

5.516  0.106145 

1.933  0.111145 

6.885  0.116145 

22.349  0.121145 

51.407  0.126145 
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TABLE  5-5 
(cont.) 

Chi  Square  P(0,0) 

98.993  0.131145 

173.401  0.136145 

289.596  0.141145 

477.560  0.146145 

807.958  0.151145 

1503.197  0.156145 

4105.585  0.161145 
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TABLE  5-6 
CHI  SQUARE  IN  a  DIMENSION 

Chi  Square  £ 

6.466  0.000 

5.602  0.020 

4.840  0.040 

4.177  0.060 

3.607  0.080 

3.127  0.100 

2.733  0.120 

2.421  0.140 

2.189  0.160 

2.031  0.180 

1.947  0.200 

1.933  0.220 

1.986  0.240 

2.104  0.260 

2.285  0.280 

2.527  0.300 

2.826  0.320 

3.183  0.340 

3.593  0.360 

4.057  0.380 

4.573  0.400 

5.138  0.420 

5.751  0.440 

6.411  0.460 

7.117  0.480 

7.867  0.500 
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TABLE  5-7 

CHI  SQUARE  IN  k  DIMENSION 

Chi  Square 

k 

4.047 

-0.0200 

3.843 

-0.0195 

3.647 

-0.0190 

3.461 

-0.0185 

3.285 

-0.0180 

3.119 

-0.0175 

2.963 

-0.0170 

2.817 

-0.0165 

2.681 

-0.0160 

2.556 

-0.0155 

2.442 

-0.0150 

2.339 

-0.0145 

2.247 

-0.0140 

2.166 

-0.0135 

2.097 

-0.0130 

2.040 

-0.0125 

1.995 

-0.0120 

1.962 

-0.0115 

1.941 

-0.0110 

1.933 

-0.0105 

1.938 

-0.0100 
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TABLE  5-7 
(cont.) 

Chi  Square  k 

1.956  -0.0095 

1.987  -0.0090 

2.033  -0,0085 

2.092  -0.0080 

2.165  -0.0075 

2.253  -0.0070 

2.355  -0.0065 

2.473  -0.0060 

2.605  -0.0055 

2.754  -0.0050 

2.918  -0.0045 

3.099  -0,0040 

3.296  -0.0035 

3.511  -0.0030 

3.743  -0.0025 

3.992  -0.0020 

4.259  -0.0015 

4.545  -0.0010 
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5.5  Minimum  Chi  Square  Results 

The  complete  set  of  minimum  chi  square  statistics  for  testing  the 
"goodness  of  fit"  of  the  Cohesive  Elements  Model  in  both  Case  I  and 
Case  II  is  given  in  this  section.   In  addition,  the  parameter  estimates 
for  Case  I  are  analyzed  in  order  to  infer  aspects  of  the  behavioral 
dynamics  of  the  dentifrice  market  during  the  period  encompassing  the 
American  Dental  Association's  endorsement  of  Crest. 

Since  sequences  of  seven  purchases  are  being  used  to  estimate  the 
model  in  all  cases,  the  chi  square  statistics  given  in  this  section  may 
be  compared  to  a  chi  square  random  variable  having  nine  degrees  of 
freedom. 

In  order  to  decrease  the  computational  burden  in  the  grid  search 
minimization  of  the  chi  square  statistic,  the  initial  sample  propor- 
tion making  response  A.  at  time  t  =  0  was  substituted  for  the  theoreti- 
cal parameter  P(0)  in  each  case.   The  properties  of  the  estimates  as 
well  as  the  distribution  of  the  sample  statistic  will  be  unaltered  by 
this  procedure  since  the  initial  sample  proportion,  Q(0),  is  a  maxi- 
mum likelihood  estimator  of  P(0)  =  EJX(0)[]. 

5.5.1   Case  I«   Before  ADA  Endorsement  Versus  After  ADA  Endorsement 
In  this  case  there  are  four  groups  each  having  a  before  and  an 
after  sequence  of  purchases.   Thus  there  are  a  total  of  eight  chi 
square  "goodness  of  fit"  statistics  with  which  to  evaluate  the  model. 
In  addition,  the  four  groups  yield  four  sets  of  before-after  parame- 
ter estimates  for  use  in  analyzing  the  impact  of  the  ADA  endorsement 
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and  the  impact  of  the  ensuing  promotional  competition  upon  consumer 
brand  choice  behavior  in  the  dentifrice  market. 

Recall  that  in  the  chi  square  procedure  the  sample  statistic  is 
asymptotically  chi  square  as  the  number  of  respondents  increases.   The 
sample  sizes  given  for  the  groups  in  Table  5-8  are  quite  large.  Con- 
sequently the  sample  statistic  should  be  distributed  in  close  approxi- 
mation to  a  chi  square  random  variable  having  nine  degrees  of  freedom. 

In  addition  to  sample  size,  Table  5-8  also  reports  information 

as  to  the  "goodness  of  fit"  of  the  model  to  the  data.  This  informa- 

2 
tion  is  contained  in  the  minimum  X  statistic  and  its  associated  p- 

2 
level.   The  minimum  X  is  the  empirical  or  sample  statistic  computed 

from  the  data  in  the  manner  discussed  in  Chapter  IV,  Section  3  and  in 

this  chapter.   See,  for  example,  Equation  4.3-5. 

2 
The  p-level  associated  with  the  empirical  X  statistic  is  formally 

defined  as 

5.5-1  p  =  5  f(X)dX 

2 
X 


where  f (X)  is  distributed  as  a  chi  square  random  variable  having  nine 

degrees  of  freedom.   That  is,  p  is  the  probability  of  observing  a 

2 
sample  statistic  at  least  as  large  as  the  X  that  was  observed  under 

the  hypothesis  that  the  model  generated  the  data.  A  very  large  or 

a  very  small  value  of  p  may  be  interpreted  in  one  of  two  ways; 


175- 


CO 

C 

o 

r-4 

•r4 

CO 

CO 

QJ 

c   o 

<T: 

•H 

u 

4-i 

• 

IM 
O 

•2-g 

o 

o 

O 

o 

O 

O 

o 

o 

O 

CU 

>s 

4J     CU 

s-x 

m 

O 

O 

m 

m 

o 

m 

in 

tj 

>-i 

•a 

CO 

ca  x; 

3 

oo 

r*. 

m 

•tf 

i-l 

o 

sO 

r^ 

o 

3 

CU 

r-l     O 

O 

O 

O 

o 

i-4 

ft 

o 

o 

TJ 

UH 

4-> 

<U 

O 

VI 

• 

* 

• 

• 

• 

ft 

• 

• 

s 

CO 

u 

Q.   0) 

CO 

o 

o 

o 

o 

O 

o 

o 

o 

CO 

CNI 

60 

U   X! 

> 

u 

.a 

c 

0) 

CU 

3 

• 

m 

c 
o 

•r4 

■u 

CO 

•r4 

x: 

4J 
O 

CU 

•o 

OS 

60 

c 

ar  inte 
on  in  t 

i>- 

r^ 

r-l 

CO 

CO 

~* 

o 

CO 

r^ 

4-» 

o 

1-1 

CU    t4 

O 

r— 1 

CO 

I— 1 

r^ 

m 

CO 

sO 

m 

a> 

0) 

(0 

> 

C    w 

s— • 

sO 

CO 

co 

a\ 

CO 

CM 

o 

OS 

X> 

CO 

CU 

CO 

•H    3 

x. 

— i 

r-l 

.-i 

O 

CM 

CM 

CM 

i-4 

CO 

x: 

i-l   XI      " 

— 1 

9J 

<1) 

o 

•r4      CO 

W 

o 

O 

o 

o 

o 

o 

o 

O 

a 

4J 

cu 

a; 

CO 

• 

a, 
u 

3 

c 
o 

t4 

4-1 

3 

ed  by 
distr 
Table 

* 

os 

s* 

OS 

vO 

m 

«tf 

00 

OS 

oca 

rJ 

Xi 

u 

* 

OS 

cm 

m 

00 

>* 

1-1 

r^ 

r>» 

CO 

o 

O 

•r-l 

3    <U  i-l 

a 

• 

■ 

• 

• 

• 

• 

• 

• 

u 

•r4 

UH 

U 

gi  U    CD 

e  co  cj 

M 

/\ 

CU 

-i 

4-1 

> 

CU 

c 

CO 

O    3   -H 

W 

CO 

cu 

< 

o 

1-1 

CO 

•H 

CJ     O*  4J 

co    eg 

<U          B 

c3 

CN 

JZ 

3 

1-1 

CU 

1-1   i-l     cu 

X 

co 

r^ 

i-i 

CO 

00 

I-l 

<!■ 

«* 

u 

u 

u 

<u  x;  x; 

co 

r^. 

CO 

OS 

sO 

o 

CM 

-* 

CU 

CO 

3      CJ     4J 

•  • 

• 

os 

sO 

OS 

m 

oo 

oo 

sO 

«* 

c 

CU 

TJ 

3 

s 

co 

in 

c 

• 

• 

• 

• 

• 

• 

• 

t 

o 

rJ 

o" 

o.  <u  s 

i 

i 

•H 

•— 1 

f-i 

r~ 

>* 

00 

CO 

m 

m 

a 

o 

•a 

CO 

X! 

m 

■z 

r-H 

I-l 

3 

uh 

e 

VM    4J    VM 

< 

cu 

CO 

1-1 

o       o 

W 

3 

•a 

Xi 

>-■ 

JZ 

U 

J 

C 

CU 

XI 

CJ 

II    O^i 

i 

(/} 

CO 

<u 

<U  iw    O 

0 

CO 

x^ 

CO 

CU 

3          O 

H 

r-4 

■-I 

cu 

.o 

•u 

JC 

i-4    C   Xi 

as 

1 

N 

r^. 

sO 

O 

-* 

r^ 

sO 

O 

-3- 

CO 

4J 

CO     CU    TJ 

CJ 

1-1 

co 

m 

00 

CT\ 

co 

m 

oo 

OS 

CO 

C 

CU 

>    >    C 

CO 

CA 

so 

U1 

<r 

oo 

sO 

m 

<f 

oo 

-r4 

i-l 

C 

CO 

i-l    CO 

| 

ta 

CU 

e 

en 

01 

i-l 

CU 

1-1 

TJ     60  K 

cu 

4-)     CO      CO 

M 

i-i 

en 

•a 

X 

M    r-l   - 

2 

u 

CO 

S-s 

o   CU    c 

a 

a 

JB 

x: 

CM 

a,  >   o 

CO 

CO 

co 

CO 

CO 

CO 

CO 

CU 

O 

u 

CU     CU    t4 

CO 

CO 

>s 

>» 

>s 

CO 

>s 

>s 

>s 

CO 

W 

i-i 

CU 

!--—.!_' 

0)    £ 

>s 

CO 

CO 

CO 

>s 

co 

CO 

CO 

CO 

3 

X! 

I-l 

i    ca 

60    O 

* 

CO 

Q 

Q 

a 

CO 

a 

Q 

Q 

x: 

d 

3 

cu 

cu   a  m 

CO     U 

CU 

Q 

a 

o 

x; 

x;        o 

U     3 

B 

m 

O 

o 

m 

O 

o 

M 

TJ 

CU 

s 

H  cu  a 

<u    a.  -h 

O 

-* 

sO 

sO 

o 

o- 

so 

sO 

3 

<1) 

r-l 

x:   m 

>    u 

H 

co 

CO 

a.  4J 

3 

X 

4J     o 

<    <u 

■ 

I 

u 

i 

i 

U 

u 

>h 

)-i 

XI 

^       u 

u 

i 

V 

1 

CD 

CU 

O 

•-N 

E   c 

c 

i-i 

sO 

> 

f-i 

sO 

> 

4J 

CL  •— 1 

X 

o   cu   u 

M 

O 

co 

<»■ 

o 

o 

CO 

>fr 

o 

C 

•r4 
CU 

cu 

r-H 

CO 

C 

o 

1-1 

sv 
js            CM 

5^n    X 

reed 
etwe 
ubbe 

00—1 

4_> 

H-i  Xi    OS 

i-l 

CN 

CO 

<J" 

I-l 

CM 

CO 

<!■ 

CO 

M 

03 

CO 

u 

O. 

V 

CU 

u 

V 

n 

3 

M 

I-l 

> 

0 

a. 

O 

O 

0) 

<:  i*. 

o 

a, 

u 

SM 

4-1 

O 

< 

* 

* 
* 

-176- 

1.  The  model  did  not  generate  the  data--i.e.,  it  doesn't  "fit" 
the  data. 

2.  The  model  fits  the  data  but  an  unlikely  event  has  occurred. 
In  this  study  the  classical  statistical  point  of  view  will  be  taken 
regarding  this  question,  which  means  that  interpretation  1  will  be 
assumed  if  the  p  values  tend  to  be  either  too  large  or  too  small. 

Turning  now  to  the  p  values  in  Table  5-8,  one  readily  sees  that 

2 
all  but  one  of  the  X  statistics  fall  well  within  the  region  in  which 

one  would  expect  these  statistics  to  fall  given  that  the  model  could 

reasonably  have  generated  these  data.   For  example,  Group  4  in  the 

13 
before  period  has  a  p  value  of  0.86.    That  is,  given  the  distribu- 
tion of  the  sample  statistic  implied  by  the  model,  the  probability 
of  a  sample  statistic  this  low  or  lower  is  1-0.86=0.14.   Since  there 
is  a  probability  of  0.14  that  a  chi  square  statistic  this  low  or  lower 
will  be  observed  under  the  null  hypothesis,  it  isn't  too  surprising 
that  a  sample  statistic  this  small  was  observed.   Hence  this  observa- 
tion is  reasonably  consistent  with  the  Cohesive  Elements  Model  formu- 
lation.  The  single  exception  to  the  excellent  fit  of  the  model  to 
the  data  occurs  for  Group  1  in  the  before  period.   In  this  case  the 
empirical  minimum  chi  square  statistic  is  so  small  that  it  would 
occur  less  than  one  time  in  one  hundred  under  the  null  hypothesis  of 
the  Cohesive  Elements  Model. 

Taken  as  a  whole  the  empirical  minimum  chi  square  statistics  and 
their  associated  p  values  indicate  that  the  Cohesive  Elements  Model 
provides  a  remarkably  good  fit  to  the  dentifrice  data  in  both  the 
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relatively  normal  pre-ADA  market  and  the  transient  post-ADA  market. 
Naturally,  one  set  of  examples  is  not  sufficient  to  establish  the 
model  as  a  generally  reasonable  model  of  dynamic  choice  behavior,  but 
the  above  results  are  most  encouraging. 

The  mean,  e[x(0)~]  ,  and  variance,  Var[x(0)J,  of  the  cross-sectional 
distribution  of  response  probability  for  each  group  in  both  periods 
are  also  reported  in  Table  5-8.   Before  considering  certain  interesting 
aspects  of  these  data,  it  is  useful  to  examine  a  diagram  of  the  re- 
sponse sequences  and  note  the  points  at  which  E[x(0)f)  and  Var[k(0)J  are 
measured  for  each  of  these  periods.   See  Figure  5-6.   Since  sequences 
of  seven  consecutive  purchases  are  being  used  in  both  the  before  and 
the  after  periods,  each  respondent  must  contribute  a  sequence  of  four- 
teen consecutive  responses  to  the  analysis.   Recall  the  criteria  for 
inclusion  of  a  household  in  the  sample  given  in  Section  5.3.   These 
fourteen  responses  are  broken  into  two  sequences  of  seven  responses 
in  Figure  5-6.   The  initial  cross-sectional  distributions  of  response 
probability  are  given  at  t  =  0  in  both  the  before  and  the  after  periods. 
That  is,  in  the  before  period  the  initial  distribution  of  response 
probability  is  measured  as  of  the  seventh  purchase  of  dentifrice  prior 
to  the  ADA  endorsement  while  in  the  after  period  the  distribution  is 
measured  as  of  the  first  purchase  after  the  endorsement. 

In  both  the  before  and  the  after  periods,  the  expected  initial 
probability  of  purchasing  Crest,  e[x(0)]  ,  is  inversely  related  to  the 
average  interpurchase  time.   The  interpretation  of  E(X(0)]  is  as  the 
expected  probability  of  purchasing  Crest  on  the  initial  trial   for  an 
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FIGURE  5-6 
RESPONSE  SEQUENCES  AND  INITIAL  DISTRIBUTIONS 


Before  .  After 

D 
A 

t   I   I 1 1 1 1— | 1 1 1 lit) 


t=0   1   2   3   4   5   & 


t=0    12   3   4   5   6 


eCx(0)]|  eCx(O)] 

V  Before  Period  f  After  Period 

VarjX(0)]J  Var[x(0)] 

Note:   Responses  occur  at  t=0,l,2, . . . ,6 

individual  randomly  selected  from  the  population.   A  post  hoc  argument 
for  the  reasonableness  of  these  results  is  as  follows: 

Before  Period.   Purchase  frequency  may  be  considered  a  surrogate 
measure  of  interest  in  the  product  class  of  dentifrice.   To  the  extent 
that  this  relationship  is  reasonable  one  might  expect  frequent  (more 
interested)  consumers  to  have  noted  and  perhaps  reacted  to  the  poten- 
tial benefits  of  stannous  flouride  even  prior  to  the  ADA  legitimation 
of  this  belief.   That  is,  it  is  suggested  that  purchase  frequency  is 
related  to  product  class  interest  which  in  turn  is  related  to  per- 
ception of  the  relatively  weak  stimulus  of  the  potential  benefit  of 
stannous  flouride  prior  to  the  endorsement. 
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After  Period.   The  intergroup  differences  in  e[x(0)[J  are  smaller 
in  the  after  period  than  they  are  in  the  before  period.   If  we  assume 
that  frequency  of  purchase  is  related  to  interest  which  in  turn  is 
related  to  the  perception  of  relatively  weak  stimuli  in  the  before 
period,  then  we  would  expect  the  massive  stimulus  of  the  ADA  endorse- 
ment to  bring  the  groups  closer  together  with  respect  to  their  expected 
response  probabilities  in  the  after  period. 

Now  that  we  have  seen  that  the  Cohesive  Elements  Model  yields  an 
excellent  fit  to  the  dentifrice  data  in  both  the  before  and  the  after 
periods  for  Case  I,  it  is  of  interest  to  consider  certain  implications 
of  the  parameter  estimates  which  were  obtained.   Estimates  of  k  and  a 
are  reported  in  Table  5-9  while  the  estimates  of  a.   and  p  are  reported 
in  Table  5-10.  Recall  that  k  =  a+  p  and  measures  the  extent  to  which 
individuals  in  the  market  are  likely  to  shift  their  brand  purchasing 
probabilities  between  purchases.   Consequently,  it  is  also  a  measure 
of  the  rate  at  which  the  market  will  approach  equilibrium  from  any  dis- 
equilibrium position.  The  equilibrium  choice  share  of  Crest  is  given 
by  a  =  a/( a+  p).   That  is,  "a"  represents  the  equilibrium  proportion 
of  choices  of  Crest  which  we  would  expect.   Since  the  choice  process 
is  stochastic,  there  will  remain  some  variance  in  choice  share  about 
"a"  at  equilibrium.   Further  recall  from  Chapter  III,  Section  1,  that  a 
represents  the  propensity  for  an  individual's  response  probability  to 
increase  with  respect  to  purchasing  Crest  while  p  is  the  propensity  of 
this  probability  to  decrease.   We  shall  consider  the  results  on  k,  a, 
a  ,  and  p  in  turn. 
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TABLE  5-9 
MINIMUM  CHI  SQUARE  PARAMETER  ESTIMATES: 
CASE  I  (k  and  a) 


Average 
interpurchase     fc  #        , 
Group        Time  B        ABA 


0-30  Days  0.0105  0.0200  0.22  0.86 

31  -  45  Days  0.0650  0.0500  0.12  0.54 

46  -  60  Days  0.0950  0.0200  0.08  0.94 

Over  60  Days  0.1300  0.0500  0.12  0.50 


The  A  subscript  denotes  a  parameter  estimate  for  the  after 
period  while  the  B  subscript  denotes  a  parameter  estimate 
for  the  before  period. 
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We  see  from  Table  5-9  that  in  the  before  period  k  is  directly  re- 
lated to  the  average  interpurchase  time.  The  greater  the  average  inter- 
purchase  time,  the  more  exogeneous  market  stimuli  such  as  competitive 
advertising  that  are  likely  to  impinge  on  the  consumer  between  purchases. 
Thus  we  would  expect  less  frequent  purchasers  to  be  somewhat  more  volatile 
with  respect  to  their  brand  response  probabilities  from  purchase  to 
purchase.   Since  k  is  a  measure  of  the  extent  to  which  individuals 
shift  their  response  probabilities  between  trials,  we  would  expect 
the  relation  between  k  and  average  interpurchase  time  which  is  given 
in  Table  5*-9 .  While  we  would  expect  this  time  effect  on  a  priori 
grounds,  nevertheless  it  is  surprising  that  group  1,  the  most  frequent 
purchasers  of  dentifrice,  is  so  much  more  stable  with  respect  to  its 
members1  response  probabilities  than  any  of  the  other  three  groups 
in  the  before  ADA  period. 

Comparison  of  a  ,  the  equilibrium  expected  choice  share,  and  EJX(0)J  , 
B  t-   — 1 

the  initial  expected  choice  share,  from  Tables  5-9  and  5-8  respectively 
reveals  that  for  groups  2  and  3  in  the  before  period  the  trend  of  Crest's 
share  was  down.   The  parameter  estimates  clearly  indicate  that  the  ADA 
endorsement  completely  reversed  this  trend.   In  fact,  all  groups  are 
estimated  to  have  a  very  marked  upward  trend  in  the  post-ADA  period. 
These  results  further  dramatize  the  tremendous  impact  of  the  endorsement. 

The  estimates  of  the  equilibrium  choice  share,  "a",  in  the  after 
period  would  seem  unduly  high  if  we  really  believed  that  the  process 
itself  (i.e.,  a  and  p)  would  remain  stationary  into  the  indefinite 
future.   It  is  to  be  expected,  however,  that  these  early  results  will 
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be  dampened  by  strong  competitive  retaliation  in  the  form  of  dealing 
and  increased  advertising.   Thus  these  results  are  merely  an  estimate 
of  what  Crest's  equilibrium  choice  share  would  be  in  each  of  these 
segments  if  the  process  did  not  change  but  were  to  be  allowed  to  run 
to  equilibrium. 

It  is  of  interest  to  examine  the  change  in  a  as  we  pass  from  the 
before  period  into  the  after  period.   The  results  on  a  are  reported  in 
Table  5-10.   In  all  cases  a  increased  markedly.  Recalling  the  meaning 
of  a  ,   we  may  interpret  this  as  an  indication  that  the  ADA  endorse- 
ment and  the  resulting  promotional  campaign  made  it  quite  likely  that 
an  individual's  probability  of  purchasing  Crest  would  increase.  Note 
that  the  greatest  increases  in  a  were  in  the  two  most  active  groups, 
group  1  and  group  2.  They  may  well  reflect  the  interest  factor  which 
was  previously  postulated.  Thus  the  most  frequent  purchasers  of  denti- 
frice were  the  most  likely  to  have  their  probability  of  purchasing 
Crest  enhanced. 

The  results  on  p  are  particularly  interesting.  Recall  that  p 
measures  the  propensity  for  an  individual's  probability  of  purchasing 
Crest  to  decrease  between  purchases.   Table  5-10  reveals  that  in  all 
groups  p  decreased  sharply  in  the  after  period  from  what  it  had  been 
in  the  before  period.   In  fact,  p  decreased  so  sharply  for  groups  2, 
3,  and  4  that  the  total  amount  of  shifting  in  individuals'  probabili- 
ties (measured  by  k)  decreases  as  we  move  from  the  before  period  to 
the  after  period.   In  these  three  cases  the  decrease  in  p  (the  likeli- 
hood that  an  individual  s  probability  of  purchasing  Crest  will  decrease) 
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was  several  times  the  increase  in  a   (the  likelihood  that  an  indi- 
vidual's probability  of  purchasing  Crest  will  increase).   Thus,  while 
the  endorsement  made  it  much  more  likely  that  an  individual's  proba- 
bility of  purchasing  Crest  would  increase,  for  groups  2,  3,  and  4  the 
impact  of  the  endorsement  was  even  more  striking  in  the  manner  in 
which  p  decreased. 

The  absolute  changes  in  a  and  p  for  groups  2,  3,  and  4  could  be 
interpreted  in  terms  of  a  probabilistic  rachet  effect  on  an  individual's 
probability  of  purchasing  Crest.   That  is,  the  endorsement  enhanced  the 
likelihood  that  this  probability  would,  in  fact,  increase  and  if  it 
did  increase,  it  would  be  very  unlikely  that  it  would  subsequently 
decrease  from  its  new  high  level.   Even  for  those  individuals  exper- 
iencing no  increase  in  their  probability  of  purchasing  Crest,  the 
magnitude  of  p  in  the  after  period  indicates  that  the  likelihood  is 
small  that  these  individuals  will  experience  a  decrease  in  their  proba- 
bility of  purchasing  Crest.   It  might  be  said  that  this  sharp  decline 
in  p  markedly  increased  the  loyalty  of  Crest  consumers.  Another  way 
of  saying  this  is  that  the  ADA  endorsement  enhanced  the  retentative 
power  of  Crest  as  a  brand  in  the  dentifrice  market  even  more  than  it 
increased  its  attractive  power. 

For  group  1  the  change  in  a  was  of  the  same  order  of  magnitude  as 
the  change  in  a  for  other  groups.   The  overall  amount  of  change, 
k  =  a+P,  increased  since  p  started  from  such  a  low  value  in  the  pre- 
ADA  period  that  it  couldn't  fall  as  sharply  in  absolute  terms  as  it 
could  for  the  other  three  groups. 
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To  summarize  briefly,  the  Cohesive  Elements  Model  suggests  that 
the  impact  of  the  American  Dental  Association's  endorsement  of  Crest 
and  the  subsequent  promotional  campaign  to  capitalize  on  this  legiti- 
mated appeal  increased  both  Crest's  attractiveness  and  its  retentative 
capacity. 

The  fact  that  the  increase  in  retentative  capacity  was  considerably 
greater  than  the  increase  in  attractiveness  is  somewhat  of  a  surprising 
result.   This  is  an  example  of  a  stochastic  model,  which  fits  a  set  of 
data  very  well,  yielding  an  inference  which  without  the  model  probably 
would  not  have  been  considered  as  seriously  as  the  alternative  hypothe- 
sis that  the  attractive  properties  of  the  brand  were  the  ones  most 
enhanced  by  the  endorsement. 

5.5.2  Case  II:   After  ADA  Average  Time  Segments 

Further  evidence  of  the  excellent  fit  of  the  Cohesive  Elements  Model 
to  the  dentifrice  data  may  be  seen  in  Table  5-11.   For  groups  2  through 

5  the  fit  is  once  again  well  within  the  limits  where  we  would  expect 

2 
807.  of  the  X  s  to  fall  under  the  null  hypothesis  that  the  model  is 

2 
valid.   Only  group  1  has  a  X  significant  at  just  beyond  the  0.05  level. 

The  excellence  of  these  fits  for  nine  degrees  of  freedom  is  yet  further 

encouragement  as  to  the  future  potential  of  this  model. 

While  the  groups  of  the  after  period  from  Case  I  will  have  some 

overlap  with  the  corresponding  groups  of  Case  II,  there  are  at  least 

three  factors  that  reduce  the  significance  of  this  overlap.   In  the 

first  place,  the  less  stringent  criterion  for  inclusion  used  for  Case  II 
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TABLE  5-11 
MINIMUM  CHI  SQUARES:   CASE  II 

Average 
Interpurchase    Sample         » 

Group        Time         Size     Min.  X  p*    e(x(0)]     Var{x(0)| 

1  0-30  Days        869    18.623  .03    0.2163      0.1050 

2  31-45  Days       963    11.816  .23    0.2150      0.0950 

3  46-60  Days       832     6.295  .71    0.2007      0.0900 

4  61-75  Days       682     4.728  .85    0.2023      0.0750 

5  Over  75  Days      1388    10.317  .33    0.1996      0.0800 


oo 
p  =  5  f(X)dX  where  f(X)  is  the  chi  square  distribution  having  nine 

x2 

degrees  of  freedom.   The  reported  values  of  p  were  computed  by 
linear  interpolation  between  the  p-levels  given  for  the  chi 
square  distribution  in  the  Chemical  Rubber  Corporation's  Handbook 
of  Mathematical  Tables. 
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enabled  us  to  achieve  a  significantly  greater  sample  size,  even  after 
defining  an  additional  group.  Another  factor  is  the  fact  that  the 
number  of  groups  differs  between  the  two  cases.   Finally,  Case  I  used 
before  ADA  average  interpurchase  times  to  segment  the  sample,  while 
Case  II  used  after  interpurchase  time.   Thus,  any  family  which  changed 
its  usage  of  dentifrice  by  an  appropriate  amount  in  passing  from  the 
before  to  the  after  period  would  be  classified  in  different  groups  in 
Case  I  and  Case  II. 

Finally,  we  see  from  Table  5-8  and  Table  5-11  that  E[X(0)~]  and 
Var  (X(0)J  exhibit  essentially  similar  patterns  across  groups  for  Case 
II  as  were  found  for  the  after  period  in  Case  I.   That  is,  both  E[X(0)J 
and  Var  Qc(0)J  tend  to  vary  inversely  with  average  interpurchase  time. 
The  range  of  variation  in  both  E[k(07]  and  Var[x(0^]  across  groups 
is  less  for  Case  II  than  for  Case  I.   Thus  for  Case  II  the  groups  were 
relatively  more  homogeneous  in  their  initial  distribution  of  response 
probability  in  the  after  period. 

5.6  Empirical  Comparison  of  Regression  and  Minimum  Chi  Square  Estimation 

Procedures 

In  this  section  attention  will  be  focused  upon  a  comparison  of  the 
regression  and  the  minimum  chi  square  estimation  procedures.   The  com- 
parison will  be  made  for  Case  II.   Recall  that  in  Section  5.3  it  was 
noted  that  Case  II  was  developed  for  the  express  purpose  of  having 
more  households  included  in  the  analysis  when  we  compare  the  regression 
method  of  estimation  to  the  minimum  chi  square  procedure. 
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The  regression  results  reported  in  this  section  were  developed  using 
the  recursive  regression  procedure  outlined  in  Chapter  IV,  Section  2. 

The  results  for  the  "k"  estimation  equation  (4.2-11)  are  pre- 
sented in  Table  5-12.   Note  that  all  the  k's  have  the  correct  sign-- 

i.e.,  are  non-zero.   The  significance  of  the  k  values  is  determined 

14 
by  a  test  against  the  null  hypothesis  that  k  =  0.    The  p-levels  for 

the  t  statistics  with  which  we  test  this  hypothesis  are  presented  in 

the  sixth  column  of  Table  5-12.   The  results  indicate  that  k  is  quite 

significantly  different  from  zero  for  groups  2,  3,  and  5  and  is 

reasonably  significant  for  the  remaining  two  groups.   On  balance  it 

appears  that  the  model  yields  a  reasonable  fit  to  the  data  as  evidenced 

by  the  k  values.    The  interpretation  problems  raised  in  Section  4.2.4 

must  be  borne  in  mind,  however. 

It  should  also  be  noted  from  Table  5-12  that  the  intercept  term, 
i/n[p(0,0)-P  (0)J  =  i>n[yar{x(0)}j  is  highly  significant  at  well  beyond 
the  0.005  level  as  may  be  seen  from  the  exceptionally  large  t  values 
given  in  the  eighth  column. 

The  "a"  equation  (4.2-13)  estimates  are  reported  in  Table  5-13. 
The  r's  reported  in  this  table  are  clearly  highly  significant.   Re- 
call our  discussion  in  Section  4.2.4.   Thus  any  inference  of  "goodness 
of  fit"  that  we  wish  to  make  from  the  r's  must  first  be  tempered  by  a 
test  of  whether  or  not  the  estimated  intercepts  are  significantly  dif- 
ferent from  zero.   The  estimated  intercepts,  their  associated  t  statis- 
tics, and  p-levels  are  reported  in  columns  4,  5,  and  6  of  Table  5-13. 
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For  groups  2,    3,   4,  and  5  the  estimated  intercept  is  not  significantly 

16 
different  from  zero,  even  at  a  very  low  level  of  significance.    Con- 
sequently we  may  interpret  the  excellent  r's  for  these  groups  or  equiva- 
lently  their  highly  significant  a    ' s  as  evidence  in  favor  of  the 
Cohesive  Elements  Model. 

The  results  for  group  1  suggest  that  the  model  does  not  provide 
an  adequate  fit  to  that  data.   This  conclusion  is  reached  because  the 
estimated  intercept  for  that  group  is  sufficiently  large  that  there 
is  less  than  one  chance  in  one  hundred  that  a  value  this  large  could 
have  arisen  by  chance  from  a  population  whose  true  intercept  value 
was  zero.   Consequently  we  conclude  that  the  model  does  not  fit  this 
data  well  in  spite  of  an  r  =  0.92  and  a  highly  significant  value  of 

a    .   This  result  is  consistent  with  the  finding  in  Section  5.5  that 
org 

for  group  1  in  Case  II,  the  model  performs  less  well  than  for  the 

2  2 

other  groups.   In  the  minimum  X  procedure  this  group  yielded  a  X 

statistic  so  large  that  we  would  expect  to  exceed  this  value  of  chi 

square  only  three  times  out  of  one  hundred  given  that  the  model  is 

valid.  Although  the  two  procedures  yield  identical  results  in  this 

case,  the  chi  square  "goodness  of  fit"  statistic  is  a  more  convenient 

summary  measure  of  the  overall  "goodness  of  fit"  of  the  model  to  the 

data. 

The  contrasting  regression  and  minimum  chi  square  estimates  for 

k,  a,  Var[k(0)J,  a  ,   and  p  are  presented  in  Table  5-14.   In  the  case 

of  Var[x(0)j  the  two  sets  of  results  are  identical  up  to  the  limit 

of  the  search  grid  (which  was  0.005)  in  all  cases  but  that  of  group 
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3.   In  that  one  case  the  error  was  only  equivalent  to  one  point  in  the 
P(0,0)  dimension  of  the  grid. 

When  we  come  to  the  results  on  the  change  parameter,  there  is 
more  disparity  between  the  results  of  the  two  procedures.  A  compari- 
son of  the  results  for  k  and  a  reveals  that  the  regression  estimates, 
while  generally  of  the  right  order  of  magnitude,  tend  to  underestimate 
k  and  overestimate  a  relative  to  the  minimum  chi  square  results.   In 
fact,  the  underestimate  of  k  is  serious  enough  to  make  the  regression 

estimates  of  both  a  and  (3  too  small  in  spite  of  the  tendenc  to  over- 

17 
estimate  a. 

In  Chapter  IV  we  discussed  the  estimation  procedures  and  the 
properties  of  their  estimates.   The  problems  raised  in  Sections  4.2.2, 
4.2.3,  and  4.2.4  indicate  that  we  are  somewhat  uncertain  concerning 
the  properties  of  the  parameters  which  we  may  obtain  by  application  of 
the  recursive  regression  procedure.   Consequently,  it  seems  reasonable 
to  judge  the  regression  estimates  against  the  minimum  chi  square  esti- 
mates which  we  know  to  be  best  asymptotically  normal  estimators  of  the 
model  parameters.   In  this  light  it  would  appear  from  this  one  case 
that  the  regression  procedure; 

1.  provides  a  reasonable  estimate  of  P(0,0), 

2.  underestimates  the  amount  of  change  which  is  occurring  in 
response  probabilities--!. e. ,  underestimates  k,  and 

3.  overestimates  the  equilibrium  choice  share  of  response  A. 
(which  in  this  case  was  Crest). 
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These  conclusions  are  tentative  and  should  not  be  accepted  as  conclu- 
sive and  final  results.   Further  exploration  of  the  model  and  the  re- 
gression procedure  is  required  before  anything  approaching  conclusive 
results  may  be  obtained.   In  any  case,  the  regression  procedure  may 
well  prove  to  be  useful  as  a  method  by  means  of  which  a  set  of  quick 
and  reasonable  "ball  park"  estimates  of  the  model  parameters  may  be 
obtained. 

5.7  Summary 

In  this  chapter  we  first  considered  the  reasons  for  choosing  the 
dentifrice  market  as  a  first  test  of  the  Cohesive  Elements  Model.  We 
then  discussed  certain  problems  of  general  interest  in  the  empirical 
application  of  the  model  to  consumer  panel  data.  The  criteria  for 
household  inclusion  and  the  rationale  for  Cases  I  and  II  were  next 
developed.   This  was  followed  by  a  discussion  of  the  grid  search  pro- 
cedure for  minimizing  the  chi  square  statistic.   When  the  Cohesive  Ele- 
ments Model  was  fitted  to  the  dentifrice  data,  it  was  found  to  yield 
a  remarkably  good  fit  as  well  as  interesting  interpretations  concerning 
the  impact  of  the  ADA  endorsement  of  Crest.   Finally,  the  regression 
estimation  procedure  was  found  to  underestimate  k  and  overestimate  a 
relative  to  the  minimum  chi  square  procedure. 

Footnotes 

1.  Crest's  market  share  was  on  the  order  of  10%  prior  to  the  ADA  en- 
dorsement. 

2.  The  post -ADA  market  share  of  Crest  was  in  excess  of  30%. 
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3.  Naturally,  the  behavior  of  the  Cohesive  Elements  Model  in  various 
product  classes  is  of  considerable  interest  and  needs  to  be  explored 
in  future  research.   The  important  point  about  the  dentifrice 
market  at  the  time  we  are  considering  it  is  that  it  provides  us 
with  an  opportunity  to  analyze  the  "transient"  and  "normal"  market 
behavior  without  the  confounding  problem  of  different  products. 

4.  The  problem  of  having  the  product  or  brand  available  to  consumers 
in  the  distribution  system  is  particularly  acute  when  one  attempts 
to  analyze  new  product  or  new  brand  introductions. 

5.  To  be  sure,  some  stock-outs  occurred  in  the  rush  to  try  Crest  after 
the  ADA  endorsement.   This,  however,  is  unlikely  to  have  had  any 
but  a  minor  influence  on  brand  choice  and  that  over  a  very  short 
interval  of  time. 

6.  I  am  particularly  indebted  to  Dr.  I.  J.  Abrams,  Vice  President- 
Technology,  of  MRCA  for  generously  supplying  the  data  at  nominal 
cost. 

7.  See  Lazarsfeld  (1954)  and  Coleman  (1964b,  Chapt.  11). 

8.  See  Rogers  (1962)  for  an  excellent  summary  of  research  on  the  dif- 
fusion and  adoption  of  innovations. 

9.  The  order  of  purchase  of  the  brands  for  those  days  on  which  multiple 
purchases  of  brands  occurs  is  taken  as  the  order  in  which  the  brand 
purchases  in  question  appear  on  the  MRCA  data  tape.  While  this  does 
some  violence  to  the  model  in  terms  of  perhaps  introducing  a  spurious 
order  of  purchase  effect,  for  practical  purposes  the  incidence  of 
this  problem  is  quite  infrequent  and  consequently  it  does  not  create 
any  significant  problems  in  the  present  empirical  application.   This 
is  fortunate  since  the  manner  in  which  the  data  are  recorded  and 
presented  would  make  adjustments  difficult  and  somewhat  arbitrary. 

10.  Anticipating  the  results  in  Tables  5-5,  5-6,  and  5-7  which  show 
that  the  X  function  is  well  behaved,  the  possibility  of  a  non- 
linear programming  solution  merits  consideration.   Prof.  Massy 
reports  some  progress  in  this  direction  with  a  different  stochastic 
model . 

11.  In  Section  4.3  a  general  formula  for  computing  the  degrees  of  free- 
dom for  the  chi  square  statistic  was  developed.   For  sequences  of 
T+l  responses,  the  chi  square  statistic  will  have  2T-3  degrees  of 
freedom.  Thus  in  the  case  of  sequences  of  seven  purchases,  there 
are  2 (6) -3  =  9  degrees  of  freedom. 
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12.  See  Cramer  (1946),  p.  525. 

13.  It  should  be  noted  that  this  is  the  worst  fit  statistic  except  for 
group  1  in  the  before  period.   By  choosing  this  extreme  example  for 
analysis,  our  confidence  in. the  model  formulation  is  enhanced  even 
further. 

14.  This  test  is  equivalent  to  testing  the  hypothesis  that  the  correla- 
tion coefficient,  r,  is  greater  than  zero  in  absolute  value.   See 
Johnston  (1963,  Chpt.  1)  for  details. 

15.  The  problem  of  obtaining  a  "measure  of  fit"  from  4.3-11  which  was 
discussed  in  Section  4.2.3  may  be  responsible  in  part  for  the  rela- 
tively low  r  for  group  1. 

16.  The  lower  the  level  of  significance,  the  more  likely  it  is  that  any 
given  estimated  intercept  will  be  found  to  be  significantly  greater 
than  zero.   Thus  for  our  present  purposes  a  low  level  of  signifi- 
cance provides  a  more  stringent  test  of  the  model. 

17.  This  tendency  to  overestimate  a  could  have  been  expected  to  result 
in  a  tendency  to  overestimate  a  ,  also.   This  expected  occurrence 
did  not  develop  due  to  the  considerable  underestimation  of  k  which 
occurred. 


Chapter  VI 
SUMMARY  AND  FUTURE  RESEARCH 

The  time  has  come  to  review  where  we  have  been,  what  progress  has 
been  made  in  this  report,  and  where  we  should  go  from  here.  Accordingly, 
this  chapter  is  subdivided  into  three  major  sections;   an  overall  sum- 
mary, a  discussion  of  the  contributions  which  this  report  makes,  and 
finally  some  suggestions  for  future  research. 

6. 1  Summary  and  Conclusions 

From  a  system  of  axioms  similar  in  nature  to  those  used  in  stimulus 
sampling  theory,  we  developed  the  General  Latent  Markov  Model.   The 
general  model  has  unspecified  birth  and  death  transition  intensities, 
X  and  u  .  The  Independent  Elements  Model  was  specified  as  a  particu- 
lar case  of  this  general  model  for  which  we  could  determine  \     and  m  . 
This  model  was  found  to  be  an  unsatisfactory  representation  of  con- 
sumer brand  choice. 

The  Cohesive  Elements  Model  was  then  specified.  We  first  demon- 
strated that  the  steady-state  distribution  of  response  probability 
(Coleman's  contagious  binomial)  tends  to  a  beta  distribution  in  the 
limit  as  the  number  of  response  elements  goes  to  inf inity--i„e. ,  as 
response  probability  becomes  a  continuous  random  variable.   The 
derivation  of  the  mean  and  variance  of  response  probability  revealed 
that  the  Cohesive  Elements  Model  represents  a  true  stochastic  change 
process  on  the  response  probability.   The  diffusion  limit  of  the  Co- 
hesive Elements  Model  was  then  presented.  Aggregation  forms  were 
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established  in  Lemma  1  and  Lemma  2.   A  procedure  for  estimating  the 
first  two  raw  moments  of  the  distribution  of  response  probability  at 
some  response  occasion  t  was  presented  next.   In  sum,  the  Cohesive  Ele- 
ments Model  appeared  to  be  an  interesting  and  viable  model  of  consumer 
brand  choice  on  theoretical  grounds. 

Before  the  model  could  be  empirically  useful,  it  was  necessary  to 
develop  methods  for  estimating  and  testing  it  in  empirical  situations. 
In  Chapter  IV  we  presented  two  alternative  estimation  procedures.   The 
recursive  regression  procedure  has  the  advantage  of  being  computationally 
convenient.   Unfortunately,  it  does  not  yield  an  unambiguous  measure 
of  the  "goodness  of  fit"  of  the  model  nor  are  the  small  sample  proper- 
ties of  the  regression  estimates  very  clear.   Although  computationally 
burdensome,  the  minimum  chi  square  estimation  procedure  provides  a 
single  measure  of  the  "fit"  of  the  model  as  well  as  parameter  estimates 
which  have  desirable  statistical  properties. 

When  the  Cohesive  Elements  Model  was  tested  on  MRCA  dentifrice 
data  for  a  period  spanning  the  ADA  endorsement  of  Crest,  the  model  was 
found  to  yield  remarkably  good  "fits"  in  both  the  "normal"  market  prior 
to  the  endorsement  and  the  "transient"  market  subsequent  to  the  en- 
dorsement.  In  addition,  the  model  yielded  an  interesting  interpreta- 
tion concerning  the  impact  of  the  ADA  endorsement  on  brand  loyalty 
toward  Crest. 

In  sum,  the  Cohesive  Elements  Model  is  both  theoretically  and 
empirically  viable  as  a  model  of  consumer  brand  choice.   It  also  yields 
measures  of  interest  to  the  marketing  manager  such  as  the  rate  of 
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approach  to  equilibrium  (the  extent  to  which  response  probabilities 
are  shifting),  the  equilibrium  choice  share,  the  propensity  for  re- 
sponse probabilities  to  increase  or  decrease  toward  the  brand  of 
interest,  and  the  distribution  of  response  probability  in  the  popu- 
lation. 

6.2  Contributions  of  this  Research 

The  contributions  of  this  report  to  the  class  of  models  known  as 
latent  Markov  models  may  be  meaningfully  divided  into  three  categories; 

1.  Model  development. 

2.  Statistical  tests  and  estimation  procedures. 

3.  Empirical  application. 

These  categories  are  developed  in  the  subsections  below, 

6.2.1  Model  Development 

This  research  has  contributed  to  the  development  of  latent  Markov 
models  in  the  following  ways: 

1.  A  sufficient  set  of  axioms  was  presented.   Formal  specifica- 
tion of  the  underlying  assumptions  as  axioms  clarifies  the 
relation  of  the  models  to  stimulus  sampling  approaches.   In 
addition,  the  axioms  demonstrate  that  the  models  are  essen- 
tially non-stationary,  heterogeneous  Bernoulli  models. 

2.  The  Cohesive  Elements  Model  was  shown  to  be  a  probability  dif- 
fusion model  as  the  number  of  response  elements  becomes  in- 
finite. 
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3.  Coleman's  contagious  binomial  distribution  (which  is  also 
the  steady-state  distribution  of  the  Cohesive  Elements  Model 
for  a  finite  number  of  response  elements)  was  shown  to  go  to 

a  beta  distribution  as  the  number  of  response  elements  becomes 
infinite. 

4.  The  process  of  change  in  response  probability  was  shown  to  be 
a  true  stochastic  change  process  for  the  Cohesive  Elements 
Model. 

5.  The  nature  of  the  degeneracy  of  the  Independent  Elements  Model 
was  demonstrated. 

6.  The  aggregation  procedure  was  established  in  Lemma  1  and 
Lemma  2. 

6.2.2  Statistical  Tests  and  Estimation  Procedures 

Methods  for  estimating  and  testing  the  Cohesive  Elements  Model 
have  been  developed  in  this  report.   The  properties  of  the  parameter 
estimates  and  of  the  measures  of  "goodness  of  fit"  have  been  presented. 
Researchers  should  now  be  able  to  test  the  appropriateness  of  the  model 
in  an  empirical  situation  prior  to  attempting  to  interpret  the  parameter 
estimates.   Note  that  it  is  no  longer  necessary  to  settle  for  Coleman's 
"just  identified  parameter"  approach.   Rather,  parameter  estimates  pos- 
sessing desirable  statistical  properties  are  available  in  conjunction 
with  a  statistical  measure  of  the  "goodness  of  fit." 
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6.2.3  Empirical  Application 

The  initial  test  of  the  Cohesive  Elements  Model  on  the  MRCA  denti- 
frice data  demonstrates  that  this  model  is  an  empirically  as  well  as 
theoretically  viable  model  of  consumer  brand  choice.  This  demonstration 
should  spur  future  research  into  theoretical  and  empirical  questions 
of  interest. 

6.3  Future  Research 

The  estimation  procedures  developed  in  this  report  do  not  yield 
an  estimate  of  X  .     With  estimates  of  just  a   and  B  we  are  only  able  to 
estimate  the  expected  value  of  the  steady-state  distribution  of  response 
probability.  We  need  an  estimate  of  Y  in  order  to  be  able  to  identify 
the  steady-state  beta  distribution  of  response  probability.   Since  the 
steady-state  distribution  of  response  probability  is  a  useful  measure, 
it  is  of  some  importance  that  the  problem  of  obtaining  an  estimate  of 
Y    be  solved. 

The  model  is  currently  applicable  to  the  binary  choice  (two  brand) 
case.   If  we  are  to  be  able  to  use  the  model  to  study  competitive  dy- 
namics in  multi-brand  markets,  it  will  be  necessary  to  extend  the  present 
formulation  to  the  N-chotomous  case. 

Finally,  there  is  a  need  to  begin  systematic  comparison  of  alterna- 
tive stochastic  models  of  consumer  brand  choice  behavior.   Morrison 
(1965  a  and  b)  has  done  some  work  in  this  direction.   More  is  needed. 
For  example,  it  should  be  fruitful  to  test  the  Cohesive  Elements  Model, 
Morrison's  Brand  Logal  Model,  and  the  Linear  Learning  Model  against 
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the  same  set  or  sets  of  data  in  order  to  see  whether  one  of  the  models 
clearly  dominates  the  others. 
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